Deploy a ML Model with Inferless

A quick guide to help you get started.
There are several ways to import your model, but for the purpose of this example, we will be using Hugging Face. By the end of this tutorial, you will have the ability to deploy a Hugging Face model in Inferless.

Pre Requisite : Note the Model Name, Type and Framework

  • Navigate to the Hugging Face model page of your choice that you want to import into Inferless.
  • Take note of the "Model Name" (you can also use the copy button), Task Type, Model Framework, and Model Type. These will be required for the next steps.
The fields to be copied/noted are mentioned in RED.

Method A: Deploy Model using Inferless platform

Navigate to your desired workspace in Inferless and Click on "Add a Model" button that you see on the top right. An import wizard will open up.
Click on Add Model
  1. 1.
    Select the framework that your model was trained on
    1. 1.
      Choose your model training framework. You have noted this from hugging face in the previous step
    2. 2.
      In case your desired framework is not available, we have provided detailed how-to guides for your reference
Choose the training framework used for your model
  1. 2.
    Choose the source of your model.
    1. 1.
      Since we are using a model from Hugging Face in this example, select Hugging Face as the method of uploading.
    2. 2.
      To proceed with the upload, you will need to connect your Hugging Face account and a private GitHub account. This is a mandatory step as the Hugging Face credentials enable us to import your private repositories, while GitHub is used to create the imported repository.
    3. 3.
      For more information: Inferless works by copying the model from Hugging Face and creating a new repository in GitHub. The model repository is then loaded into Inferless.
    4. 4.
      If you are using Inferless for the first time and have not added any providers before, click on the "Add Provider" button (+).
Click on Add provider to add your HF and Github accounts
This would open a new tab with "My Integrations". You can click connect on the necessary accounts there.
Click connect account and follow the process.
Once you have completed it, you can come back to this page and click the refresh button provided. You would be able to see the added accounts then.
Click on the refresh button to view the connection details
  1. 3.
    Enter the model details
Enter the details as noted
  1. 1.
    Model Details: In this step, Add your model name (The name that you wish to call your model), Choose the type of model (Eg: Transformer), Choose the task type (Eg: Text generation) and paste the name of the HF model from the hugging face portal.
  2. 2.
    Input Schema / Sample Input & Output: Enter the schema of defining the input or enter a sample payload.
  3. 3.
    Once you click next, you would notice a "Please Wait" screen which means we are validating your model for requirements. If it is a success, the next step will open up if not a set of validation issues will be notified.
  4. 4.
    Once you click next, you would notice a "Please Wait" screen which means we are validating your model for requirements. If it is a success, the next step will open up if not a set of validation issues will be notified.
  1. 4.
    Configure the Runtime details and machine configuration.
    1. 1.
      After successful validation, you will move to the next step where you will be asked to choose your desired runtime and machine configurations.
    2. 2.
      We suggest using ONNX for the most optimal results. If we are unable to convert to ONNX, we will use your native framework and load the model.
    3. 3.
      If you would like to keep the same framework as your input model, you can select it from the dropdown.
    4. 4.
      Choose the Minimum and Maximum replicas that you would need for your model
      1. 1.
        Min replica:
        The number of inference workers to keep on at all times.
      2. 2.
        Max replica:
        The maximum number of inference workers to allow at any point in time
    5. 5.
      In case you would like to set up an Automatic rebuild for your model, enable it
    6. 6.
      You would need to set up a web-hook for this method. Click here for more details.
Set runtime and configuration
  1. 5.
    Review your model details
    1. 1.
      Once you click "Continue," you will be able to review the details added for the model.
    2. 2.
      If you would like to make any changes, you can go back and make the changes.
    3. 3.
      Once you have reviewed everything, click "Submit" to start the model import process.
Review all the details carefully before proceeding
  1. 6.
    Call the model APIs
  1. 1.
    Once you click finish, the model import process will start.
    1. 1.
      This would take some time, meanwhile, you would be re-directed to your workspace -> "In Progress/Failed". You will be able to view the status here.
  2. 2.
    In case of any errors or if you wish to see the build logs, you can click view logs on the menu(3 dots menu next to model name).
  3. 3.
    Post-upload, the model will be available under "My Models"
  1. 4.
    You can then select a model and go to -> API -> Inference Endpoint details. You can call and use your model then.
Use the curl command to call the API
  1. 7.
    API key details in Workspace
    1. 1.
      In case you need help with API Keys:
    2. 2.
      Click on settings, available on the top, next to your Workspace Name
    3. 3.
      Click on "Workspace API keys"
    4. 4.
      You can view the details of your key or generate a new one.
Sample for now

Here is a sample video of the whole process for a 7GB Stable diffusion Model : Click to view

Method B: Deploy Model using Inferless CLI

Inferless allows you to deploy your model using Inferless-CLI
  1. 1.
    Installation: Open your terminal or command prompt and run the following command:
pip install inferless-cli
  1. 2.
    User Login
Users must log in to their workspace by setting the access-token. Follow the provided URL and execute the following command in your terminal:
inferless login
  1. 3.
    Create the
Create the file in your root folder following the structure requirements available here.
from diffusers import StableDiffusionPipeline
import torch
from io import BytesIO
import base64
class InferlessPythonModel:
def initialize(self):
self.pipe = StableDiffusionPipeline.from_pretrained(
def infer(self, inputs):
prompt = inputs["prompt"]
image = self.pipe(prompt).images[0]
buff = BytesIO(), format="JPEG")
img_str = base64.b64encode(buff.getvalue()).decode()
return { "generated_image_base64" : img_str }
def finalize(self):
self.pipe = None
  1. 4.
    Create the in the root folder
"prompt": {
'datatype': 'STRING',
'required': True,
'shape': [1],
'example': ["There is a fine house in the forest"]
  1. 4.
    Initialization of the model
Run the following command to initialize your model:
inferless init
  1. 5.
    Deploy the Model
Execute the following command to deploy your model. Once deployed, track build logs on the Inferless platform:
inferless deploy
Last modified 7d ago