Use the command inferless remote-run to run model inference on remote GPU from your local machine. This command will execute a particular function or class in the cloud environment.

Pre Requisites

  • You need to have python 3.10
  • You need to have inferless-cli and inferless installed in the python env
  • Max time 10 mins is allowed for remote run ( For your python code )

Getting Started

Let’s assume you have an app.py with 2 functions init and load

Class Method

You will need to add annotations to code to your app.py to make it run with remote run

  • request - Annotation that defines the request schema
  • response - Annotation that defines the response schema
  • load - Annotation that takes care of loading the model
  • infer - Annotation that defines the function for the inference logic
  • local_entry_point - Annotation lets you mark control the entry point of the execution

After you have added annotations instantiate the class

app = inferless.Cls(gpu=“T4”)

import torch
from transformers import pipeline
from pydantic import BaseModel, Field
import inferless

@inferless.request
class RequestObjects(BaseModel):
    prompt: str = Field(default="a horse near a beach")

@inferless.response
class ResponseObjects(BaseModel):
    generated_txt: str = Field(default='Test output')

app = inferless.Cls(gpu="T4")
class InferlessPythonModel:
    @app.load
    def initialize(self):
        self.generator = pipeline("text-generation", model="EleutherAI/gpt-neo-125M",device=0)
    @app.infer
    def infer(self, inputs):
        pipeline_output = self.generator(inputs.prompt, do_sample=True, min_length=128)
        generateObject = ResponseObjects(generated_txt = pipeline_output[0]["generated_text"])
        return generateObject

@inferless.local_entry_point
def my_local_entry(dynamic_params):
    model_instance = InferlessPythonModel()
    return model_instance.infer(RequestObjects(**dynamic_params))

Usage

inferless remote-run <filename>

Params:

  • --config -c : Path to the runtime configuration file
  • --exclude -e : Path to the ignore file. This file contains the list of files that you want to exclude from the remote run similar to .gitignore file.
  • --gpu -g : Denotes the machine type (A10/A100/T4)

Runtime Configuration

You can configure the runtime for remote run using a configuration file. The configuration file is a YAML file through which you can specify custom packages that you want to install on the remote server.

You can specify system packages (packages installed using apt-get) python packages (packages installed using pip) and run commands (shell commands) that you want to configure on the remote server.

# runtime.yaml
build:
  system_packages:
    - libssl-dev
  python_packages:
    - accelerate==0.27.2
    - torch==2.1.1
  run_commands:
    - wget https://example.com/model.pth

Examples:

inferless remote-run app.py -c runtime.yaml  -g T4 --prompt "Write me a story of the World"
inferless remote-run app.py -c runtime.yaml -e .ignore --message "Write me a story of the World"

For more details and examples refer to the Remote Run documentation .