Cli import

Getting started

Login to Inferless.com console and copy the CLI keys from Keys section here
Now install Inferless CLI package using this command
```
 pip install inferless-cli
```
Login to Inferless CLI using this command and paste the CLI keys here, you’ll be logged in
```
inferless login
```

Sample model deployment

To deploy a model with Inferless you would ideally require 3 files
- app.py is a Python file that plays a crucial role in setting up and running models on the Inferless platform. It typically contains a class with three main functions:
- input_schema.py is a python file specifies the input parameters for your model’s API calls.
- config.yaml(Runtime) refers to the software and dependencies you can add to your runtime environment to support your model’s specific needs.
To get started with your first deployment we will provide you app.py and input_schema.py. Run this command to download the files
```
inferless scaffold --demo 
```
Now that the files are downloaded. Initialise the model using this command
```
inferless init --name <modelname>
```
Once the model is initialised. Deploy it using this command
```
 inferless deploy --gpu T4 
```
Hurray! You’ve successfully deployed your first model with us

Runtime

Runtime in Inferless refers to the environment and configuration in which your model runs. It includes:

System packages
Python packages
Custom shell commands

Create and deploy with runtime

Create a new file called inferless_runtime_config.yaml with below data

build:
cuda_version: "12.1.1"
python_packages:
    - "accelerate==0.33.0"
    - "torch==2.4.0"
    - "transformers==4.44.0"
    - "diffusers==0.30.0"

Run the below command to create the runtime

inferless runtime create --name <runtime_name> --path ./inferless_runtime_config.yaml

Deploy with runtime

inferless deploy --gpu t4 --region <region_name> --runtime <runtime_name>

Volumes

Volumes in Inferless are NFS-like writable storage spaces that can be connected to multiple replicas simultaneously. They serve several key purposes:

Storing model parameters
Archiving datasets (similar to centralized storage)
Setting up shared caches for collaborative tasks

Creating and uploading weights

To create a new volume. Run this command

inferless volume create --name <volume_name>

New volume will be created and you will be shown the infer_path where you can store the weights. Make sure you keep this handy
Once the volume is created. Upload the weights using this command and paste the infer_path in the destination
```
inferless volume cp --source <source_path> --destination <remote_path (Infer path)>
```
Your files will be copied to the server and your volume is ready to be used.

To use these weights, in app.py you can specify the mount path from where the weights can be accessed (Eg: /var/nfs-mount/<volume_name>)

from diffusers import StableDiffusionPipeline
import torch
from io import BytesIO
import base64

MODEL_WEIGHTS_DIR =  "/var/nfs-mount/my_volume"

class InferlessPythonModel:    
    def initialize(self):
        self.pipe = StableDiffusionPipeline.from_pretrained(
            "/var/nfs-mount/my_volume",
            use_safetensors=True,
            torch_dtype=torch.float16,
            device_map='auto'
            local_files_only=True
        )

Run the command to create a new model using the above model weights
```
inferless init --name <modelname>
```
Once the model is initialized. Deploy it using this command (Make sure the path defined inside app.py and here in the command should be same)
```
inferless deploy --gpu t4 --region <region_name> --volume <volume_name> --volume-mount-path <volume_mount_path> 
```

Getting Started

Concepts

Integrations

API Reference

Model Import

Getting started

Sample model deployment

Runtime

Create and deploy with runtime

Volumes

Creating and uploading weights

Getting Started

Concepts

Integrations

API Reference

Model Import

​Getting started

​Sample model deployment

​Runtime

​Create and deploy with runtime

​Volumes

​Creating and uploading weights

Getting started

Sample model deployment

Runtime

Create and deploy with runtime

Volumes

Creating and uploading weights