Getting started

  1. Login to Inferless.com console and copy the CLI keys from Keys section here

  2. Now install Inferless CLI package using this command

     pip install inferless-cli
    
  3. Login to Inferless CLI using this command and paste the CLI keys here, you’ll be logged in

    inferless login
    

Sample model deployment

  1. To deploy a model with Inferless you would ideally require 3 files

    • app.py is a Python file that plays a crucial role in setting up and running models on the Inferless platform. It typically contains a class with three main functions:
    • input_schema.py is a python file specifies the input parameters for your model’s API calls.
    • config.yaml(Runtime) refers to the software and dependencies you can add to your runtime environment to support your model’s specific needs.
  2. To get started with your first deployment we will provide you app.py and input_schema.py. Run this command to download the files

    inferless scaffold --demo 
    
  3. Now that the files are downloaded. Initialise the model using this command

    inferless init --name <modelname>
    
  4. Once the model is initialised. Deploy it using this command

     inferless deploy --gpu T4 
    
  5. Hurray! You’ve successfully deployed your first model with us

Runtime

Runtime in Inferless refers to the environment and configuration in which your model runs. It includes:

  1. System packages
  2. Python packages
  3. Custom shell commands

Create and deploy with runtime

  1. Create a new file called inferless_runtime_config.yaml with below data

    build:
    cuda_version: "12.1.1"
    python_packages:
        - "accelerate==0.33.0"
        - "torch==2.4.0"
        - "transformers==4.44.0"
        - "diffusers==0.30.0"
    
  2. Run the below command to create the runtime

    inferless runtime create --name <runtime_name> --path ./inferless_runtime_config.yaml
    
  3. Deploy with runtime

    inferless deploy --gpu t4 --region <region_name> --runtime <runtime_name> 
    

Volumes

Volumes in Inferless are NFS-like writable storage spaces that can be connected to multiple replicas simultaneously. They serve several key purposes:

  1. Storing model parameters
  2. Archiving datasets (similar to centralized storage)
  3. Setting up shared caches for collaborative tasks

Creating and uploading weights

  1. To create a new volume. Run this command

    inferless volume create --name <volume_name>
    
  2. New volume will be created and you will be shown the infer_path where you can store the weights. Make sure you keep this handy

  3. Once the volume is created. Upload the weights using this command and paste the infer_path in the destination

    inferless volume cp --source <source_path> --destination <remote_path (Infer path)>
    
  4. Your files will be copied to the server and your volume is ready to be used.

  5. To use these weights, in app.py you can specify the mount path from where the weights can be accessed (Eg: /var/nfs-mount/<volume_name>)

    from diffusers import StableDiffusionPipeline
    import torch
    from io import BytesIO
    import base64
    
    MODEL_WEIGHTS_DIR =  "/var/nfs-mount/my_volume"
    
    class InferlessPythonModel:    
        def initialize(self):
            self.pipe = StableDiffusionPipeline.from_pretrained(
                "/var/nfs-mount/my_volume",
                use_safetensors=True,
                torch_dtype=torch.float16,
                device_map='auto'
                local_files_only=True
            )
    
  6. Run the command to create a new model using the above model weights

    inferless init --name <modelname>
    
  7. Once the model is initialized. Deploy it using this command (Make sure the path defined inside app.py and here in the command should be same)

    inferless deploy --gpu t4 --region <region_name> --volume <volume_name> --volume-mount-path <volume_mount_path>