Use the command inferless remote-run to run model inference on remote GPU from your local machine. This command will execute a particular function or class in the cloud environment.

Pre Requisites

  • You need to have python 3.10
  • You need to have inferless-cli and inferless installed in the python env
  • Max time 10 mins is allowed for remote run ( For your python code )

Getting Started

Let’s assume you have an app.py with 2 functions init and load, You need to 4 lines of code to your app.py to make it run with remote run by initialising the inferless cls and adding functional annotations

from threading import Thread
from inferless import Cls # Add the inferless library


InferlessCls = Cls(gpu="T4")  # Init the class. the type of GPU you want to run with (This can be passed from the command argument aswell)
class InferlessPythonModel:

    @InferlessCls.load     # Add the annotation
    def initialize(self):
        import torch
        from transformers import pipeline
        self.generator = pipeline("text-generation", model="EleutherAI/gpt-neo-125M",device=0)

    @InferlessCls.infer    # Add the annotation
    def infer(self, inputs):
        prompt = inputs['message']
        pipeline_output = self.generator(prompt, do_sample=True, min_length=120)
        generated_txt = pipeline_output[0]["generated_text"]
        return {"generated_text": generated_txt }

Usage

inferless remote-run <filename>

Params:

  • --config -c : Path to the runtime configuration file
  • --exclude -e : Path to the ignore file. This file contains the list of files that you want to exclude from the remote run similar to .gitignore file.
  • --gpu -g : Denotes the machine type (A10/A100/T4)

Examples:

inferless remote-run app.py -c runtime.yaml  -g T4 --message "Write me a story of the World"
inferless remote-run app.py -c runtime.yaml -e .ignore --message "Write me a story of the World"

For more details and examples refer to the Remote Run documentation .