Deploy and Run ComfyUI as an API on Inferless
Welcome to an immersive tutorial that guides you through leveraging the power of ComfyUI’s API capabilities and deploying your workflows on Inferless. This resource is designed to help you create and deploy custom workflows, extending ComfyUI’s API functionality. You’ll learn how to interact with ComfyUI and deploy on Inferless.
Introduction
ComfyUI is an open-source graphical user interface for image-generation models, used for generating images from text prompts using node-based approach. It allows users to create complex image generation workflows and offers extensive customization options, making it a powerful tool for AI-driven creativity.
In this tutorial we will show you how you can use your custom workflow with the ComfyUI API on Inferless.
Overview of the Solution
Our solution revolves around these key files that works together to set up and run ComfyUI on Inferless, alongside an NFS (Network File System) volume for persistent storage.
-
build.sh
: This shell script automates the setup of the ComfyUI environment on the NFS volume. It leverages the ComfyUI command-line interface (CLI) to install and configure ComfyUI, ensuring the necessary model weights are downloaded into the specified workspace directory. -
app.py
: This Python script contains theInferlessPythonModel
class, which Inferless uses to manage the application lifecycle.- The
initialize
function within this class triggers thebuild.sh
script to set up the environment. - The
infer
function is responsible for processing incoming user requests by interacting with the ComfyUI server to generate images, which it then returns to the user. Additionally, this function handles loading user workflows, updating them with user prompts, and managing the lifecycle of the ComfyUI server.
- The
-
comfy_utils.py
: This utility script have helper functions that streamline our interaction with ComfyUI. -
inferless-runtime-config.yaml
: This YAML file is crucial for configuring the runtime environment for our ComfyUI application on Inferless.
Architecture overview
Our solution utilizes a streamlined request-response model consisting of the following steps:
- User Request: Users submit requests to the Inferless endpoint, specifying both the desired workflow and a prompt. Inferless then directs these requests to our ComfyUI server.
- ComfyUI Processing: Upon receiving the request, the specified workflow is executed by the ComfyUI server, which processes the prompt to generate the image. Once the image is ready, we retrieve the result.
- Response Delivery: The generated image is encoded in base64 format and returned to the user.
Deploy your ComfyUI application
Deploying your ComfyUI application on Inferless involves a series of straightforward steps that leverage the platform’s serverless capabilities. Here’s the steps for deployment:
-
Begin by creating an NFS volume on Inferless. This volume will serve as the persistent storage for your ComfyUI files, workflows, and generated images. Note the mount path (e.g.,
/var/nfs-mount/YOUR_VOLUME_MOUNT_PATH
) as you’ll need to pass as an environment variable asNFS_VOLUME
. -
Ensure your
build.sh
,app.py
,comfy_utils.py
, and any custom workflow JSON files are ready. These files should be uploaded to a GitHub repository for easy access during deployment. -
Log into your Inferless account and click on the
Add a custom model
. Then follow these steps:- Select the Github from the model provider list and then select the GitHub repository URL and branch.
- Choose the type of machine, and specify the minimum and maximum number of replicas for deploying your ComfyUI.
-
Upload the Custom Runtime and choose the NFS Volume that we have created. Secrets and set Environment variables like Inference Timeout, Container Concurrency, Scale Down Timeout.
-
Now pass the NFS Volume path and Hugging Face access token as a environment variables
NFS_VOLUME
as key and YOUR_VOLUME_MOUNT_PATH as the value. And thenHF_ACCESS_TOKEN
as key and YOUR_HF_ACCESS_TOKEN as the value. -
Click on the deploy to start the deployment process.
Deploying Your Model with Inferless CLI
Inferless allows you to deploy your model using Inferless-CLI. Follow the steps to deploy using Inferless CLI.
Clone the repository of the model
Let’s begin by cloning the model repository:
Deploy the Model
To deploy the model using Inferless CLI, execute the following command:
Explanation of the Command:
--gpu A100
: Specifies the GPU type for deployment. Available options includeA10
,A100
, andT4
.--runtime inferless-runtime-config.yaml
: Defines the runtime configuration file. If not specified, the default Inferless runtime is used.
Adding your ComfyUI workflow with Example
Let’s take ComfyUI workflow for Flux as an example and import this workflow into Inferless.
- First, download the
workflow.json
for the Flux workflow and convert it into a format compatible with the ComfyUI API. - Identifying Required Models:
For the Flux workflow, we need to download the FLUX model and any other model required for this workflow. We’ll add this to our
build.sh
script. You can add any other models in similar way.
- Updating Input Schema:
If your Flux workflow requires additional user inputs, you will need to update the
input_schema.py
file accordingly. For instance, if your workflow requires both a prompt and a negative prompt, update the file to handle these inputs.
Running Comfy-UI on Inferless
Once your ComfyUI workflow is set up and deployed on Inferless, you can run it effortlessly through a simple API call.
-
API Endpoint: Inferless provides a unique URL for your deployed model, which will be used for making requests.
-
Authentication: Include an authorization token in the request headers. Inferless uses bearer token authentication for secure access.
-
Input Format: Format your input as a JSON object, specifying the parameters defined in your
input_schema.py
.
Here’s a Python example of how to make a request to your deployed ComfyUI model on Inferless:
In this example:
Choosing Inferless for Deployment
Deploying your ComfyUI application with Inferless offers compelling advantages, making your development journey smoother and more cost-effective. Here’s why Inferless is the go-to choice:
- Ease of Use: Forget the complexities of infrastructure management. With Inferless, you simply bring your model, and within minutes, you have a working endpoint. Deployment is hassle-free, without the need for in-depth knowledge of scaling or infrastructure maintenance.
- Cold-start Times: Inferless’s unique load balancing ensures faster cold-starts. Expect around
10.59
seconds to process each queries, significantly faster than many traditional platforms. - Cost Efficiency: Inferless optimizes resource utilization, translating to lower operational costs. Here’s a simplified cost comparison:
Scenario 1
You are looking to deploy a ComfyUI application for processing 100 queries.
Parameters:
- Total number of queries: 100 daily.
- Inference Time: All models are hypothetically deployed on A100 80GB, taking
10.59
seconds of processing time and a cold start overhead of6.88
seconds. - Scale Down Timeout: Uniformly 60 seconds across all platforms, except Hugging Face, which requires a minimum of 15 minutes. This is assumed to happen 100 times a day.
Key Computations:
- Inference Duration: Processing 100 queries and each takes 10.59 seconds Total: 100 x 10.59 = 1059 seconds (or approximately 0.29 hours)
- Idle Timeout Duration: Post-processing idle time before scaling down: (60 seconds - 10.59 seconds) x 100 = 4941 seconds (or 1.37 hours approximately)
- Cold Start Overhead: Total: 100 x 6.88 = 688 seconds (or 0.19 hours approximately)
Total Billable Hours with Inferless: 0.29 (inference duration) + 1.37 (idle time) + 0.19 (cold start overhead) = 1.85 hours
Total Billable Hours with Inferless: 1.85
hours
Scenario 2
You are looking to deploy a ComfyUI application for processing 1000 queries per day.
Key Computations:
- Inference Duration: Processing 1000 queries and each takes 10.59 seconds Total: 1000 x 10.59 = 10590 seconds (or approximately 2.94 hours)
- Idle Timeout Duration: Post-processing idle time before scaling down: (60 seconds - 10.59 seconds) x 100 = 4941 seconds (or 1.37 hours approximately)
- Cold Start Overhead: Total: 100 x 6.88 = 688 seconds (or 0.19 hours approximately)
Total Billable Hours with Inferless: 2.94 (inference duration) + 1.37 (idle time) + 0.19 (cold start overhead) = 4.5 hours
Total Billable Hours with Inferless: 4.5
hours
Pricing Comparison for all the Scenario
Scenarios | On-Demand Cost | Inferless Cost |
---|---|---|
100 requests/day | $28.8 (24 hours billed at $1.22/hour) | $2.26 (1.85 hours billed at $1.22/hour) |
1000 requests/day | $28.8 (24 hours billed at $1.22/hour) | $5.49 (4.5 hours billed at $1.22/hour) |
By opting for Inferless, you can achieve up to 80.94% cost savings.
Please note that we have utilized the A100(80 GB) GPU for model benchmarking purposes, while for pricing comparison, we referenced the A10G GPU price from both platforms. This is due to the unavailability of the A100 GPU in SageMaker.
Also, the above analysis is based on a smaller-scale scenario for demonstration purposes. Should the scale increase tenfold, traditional cloud services might require maintaining 2-4 GPUs constantly active to manage peak loads efficiently. In contrast, Inferless, with its dynamic scaling capabilities, adeptly adjusts to fluctuating demand without the need for continuously running hardware.
Conclusion
By following this approach, you can easily integrate your ComfyUI workflows into other applications or scripts, leveraging the power of Inferless for efficient and scalable AI image generation.