Remote Run: Run your code remotely
It is now possible to run your code remotely using Inferless. This feature is extremely useful when you want to run your code partly or entirely on a remote server. By introducing a few annotations in your code, you can run your code on a remote server with the command inferless remote-run app.py -c config.yaml.
import inferless
@inferless.method(gpu="T4")
def my_agent():
return "Hello World"
Runtime Configuration
You can configure the runtime for remote run using a configuration file. The configuration file is a YAML file through which you can specify custom packages that you want to install on the remote server.
You can specify system packages (packages installed using apt-get
) python packages (packages installed using pip
) and run commands (shell commands) that you want to configure on the remote server.
# runtime.yaml
build:
system_packages:
- libssl-dev
python_packages:
- accelerate==0.27.2
- torch==2.1.1
run_commands:
- wget https://example.com/model.pth
Annotations
You can use the following annotations to specify the code that you want to run on the remote server.
The annotations accept a gpu
parameter which specifies the GPU that you want to use on the remote server.
Currently, the supported GPUs are T4
& A10
@inferless.method
Use this annotation to specify the function that you want to run on the remote server.
@inferless.Cls
Use this annotation to specify the class that you want to run on the remote server. Cls
contains two sub annotations @Cls.load
and @Cls.infer
.
@Cls.load
is used to specify the function that you want to run before the inference.
@Cls.infer
is used to specify the function that you want to run for inference.
import inferless
app = inferless.Cls(gpu="T4")
class MyLLModel:
@app.load
def load(self):
# load the model
pass
@app.infer
def infer(self):
# run inference
return "generated output"
Examples
This section contains working examples of how to use the remote-run
command.
Example 1 - Run a function on a remote server
# app.py
import inferless
@inferless.method(gpu="T4")
def test_gpt_neo(prompt):
from transformers import pipeline
pipe = pipeline("text-generation", model="EleutherAI/gpt-neo-125m")
expected_output = pipe(prompt, max_length=50, do_sample=False)[0]['generated_text']
return expected_output
def post_process(output):
import re
processed_output = re.sub(r'\b(\d{3}-\d{2}-\d{4})\b', '[REDACTED]', output)
return processed_output
prompt = "Hello, Write a story about a dragon"
output = test_gpt_neo(prompt)
processed_output = post_process(output)
print(processed_output)
# runtime.yaml
build:
python_packages:
- transformers
- torch
- accelerate
run command inferless remote-run app.py -c runtime.yaml
In this example, only the test_gpt_neo
function will run on the remote server. The post_process
function will run on the local machine.
As you can see, the transformers
is imported inside the function which means it is not required to install the transformers package on the local machine.
Example 2 - Run a class on a remote
# app.py
import os
import inferless
app = inferless.Cls(gpu="A10")
HF_TOKEN = "<HF_TOKEN>"
class InferlessPythonModel:
@app.load
def initialize(self):
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "TheBloke/Llama-2-7B-chat-AWQ"
tokenizer_model = "meta-llama/Llama-2-7b-chat-hf"
self.sampling_params = SamplingParams(
temperature=0.7,
top_p=0.1,
repetition_penalty=1.18,
top_k=40,
max_tokens=512,
)
self.llm = LLM(model=model_id, enforce_eager=True, quantization="AWQ")
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_model, token=HF_TOKEN)
@app.infer
def infer(self, inputs):
prompts = inputs["prompt"]
input_text = self.tokenizer.apply_chat_template([{"role": "user", "content": prompts}], tokenize=False)
result = self.llm.generate(input_text, self.sampling_params)
result_output = [output.outputs[0].text for output in result]
return {"generated_result": result_output[0]}
model = InferlessPythonModel()
print(model.infer({"prompt": "Hello, Write a story about a dragon"}))
# runtime.yaml
gpu:
T4
build:
python_packages:
- "accelerate==0.30.1"
- "transformers==4.41.2"
- "torch==2.3.0"
- "vllm==0.4.3"
run command inferless remote-run app.py -c runtime.yaml
In this example, when the infer
function is called, the initialize
function will run first. The initialize
function will load the model and tokenizer. The infer
function will run the inference on the remote server.
Replace <HF_TOKEN>
with your Hugging Face token.
Working Directory
By default, the files from the working directory are copied to the server excluding: “.git”, “*.pyc”, ”pycache”
If a .gitignore
file is present in the working directory, the files mentioned in the .gitignore
file will not be copied to the server.
You can also specify a custom ignore file using the --exclude
-e
option.
inferless remote-run app.py -c runtime.yaml -e custom_ignore_file.txt