The simplest way to deploy ML models in production

Install the Inferless CLI package

pip install --upgrade inferless-cli

Login into the inferless workspace

inferless login

A new window will open

Copy the below CLI command

After Login, you will see this message

Import a Model

You need to have an in the root folder with below Class as the entrypoint

from diffusers import StableDiffusionPipeline
import torch
from io import BytesIO
import base64

class InferlessPythonModel:
    def initialize(self):
        self.pipe = StableDiffusionPipeline.from_pretrained(

    def infer(self, inputs):
        prompt = inputs["prompt"]
        image = self.pipe(prompt).images[0]
        buff = BytesIO(), format="JPEG")
        img_str = base64.b64encode(buff.getvalue()).decode()
        return { "generated_image_base64" : img_str }
    def finalize(self):
        self.pipe = None

Then create the

    "prompt": {
        'datatype': 'STRING',
        'required': True,
        'shape': [1],
        'example': ["There is a fine house in the forest"]

Once you have this, Run the below command to start a new model import

inferless init

You will see the below prompt

Once init is complete you will see the below files created

├── inferless-config.yaml
├── inferless.yaml
# Depreciated 
├── input.json
└── output.json
  • inferless-runtime-config.yamlThis file will have all the software packages and the Python packages required for the model inferencing.

  • inferless.yamlThis file will have all the configurations required for the deployment. Users can update this file according to their requirements.

Run a Model Locally

inferless run 

Deploy a Model to Inferless

Run the below command to push the model to production

inferless deploy 

You will see the below after-deployment

In UI in the Progress Section you will see :

Getting the logs

inferless log -i  

Redeploy the Model with ‘updated code

inferless model redeploy 

All Options in Inferless CLI

Optional Setting :

Using Runtimes with CLI

During the model init if you have a custom requirements.txt file you can use that to automatically create the config.yaml

Creating using requirements.txt

Generated file

// Runtime YAML file 
  # cuda_version: we currently support 12.1.1 and 11.8.0.
  cuda_version: 12.1.1
  - huggingface-hub==0.11.0
  - transformers==4.36.1==6.0.1
  - diffusers==0.24.0

If you don’t have the requirements in the same repo you can build the config.yaml using the below documentation

Push the runtime

inferless runtime upload

The CLI will ask you to update the config automatically, or else you can update manually in inferless.yaml

Using an existing runtime :

inferless runtime select --id 

Creating a Volume

inferless volume create

List all the volumes

Use this command to get the id of the volume

inferless volume list 

Using an existing volume

inferless volume select --id 

Copy data from machine to Volume

copy a file

inferless volume cp -s /path/to/local/file -d infer://region-1/<volume-name>/<folder>/file  

copy the entire folder

inferless volume cp -r -s /path/to/local/folder -d infer://region-1/<volume-name>/<folder> 

List the data in Volume

inferless volume ls -i <volume-id> -p /path 

Copy data Volume to local machine

inferless volume cp -s infer://region-1/<volume-name>/<folder> -d /path/to/local/file 

Delete the data in the Volume

inferless volume rm -p infer://region-1/<volume-name>/<folder> 

Depreciated - Input / Output Json

  • input.jsonThis file will have the key for the input parameter. Whenever you change the name of the key in the, update it accordingly.

  • output.jsonThis file will have the name of the output key that the def infer the function is going to return.

Update the input.json / Output Json as per your model.

// Input.json 
  "inputs": [
      "data": [
        "Image of horse near beach"
      "name": "prompt",
      "shape": [
      "datatype": "BYTES"
// Output.json 

  "outputs": [
      "data": [
      "name": "generated_image_base64",
      "shape": [
      "datatype": "BYTES"