Configuring Concurrent Requests

On this page

Sequential Processing with Queue
Batch Processing with Queue
Step 1: Preparing the model to handle concurrent requests
Step 2: Configuring the model to handle concurrent requests

Inferless allows you to process multiple requests concurrently by the same replica. This can help you improve the throughput of your model and handle multiple requests simultaneously. In this guide, we’ll walk you through the steps to configure your model to handle concurrent requests. There are 2 ways to configure concurrent requests in Inferless:

** Sequentical Processing with Queue**
** Batch Processing with Queue **

Sequential Processing with Queue

This is the simplest way to process multiple requests with the same replica. In this method, the requests are processed sequentially by the same replica. This is useful when you have task that takes less time to process. To configure this you can go to Model Import -> Settings

Set the Container Concurrency to ‘desired_number’ and click on Update. You can set any value between 1 to 100.

Batch Processing with Queue

This method processes the requests in batches by the same replica. This is useful when you have tasks that take longer to process and want to process multiple requests simultaneously.

Step 1: Preparing the model to handle concurrent requests

Define the BATCH_SIZE and BATCH_WINDOW in the input_schema.py input_schema.py

BATCH_SIZE = 4
BATCH_WINDOW = 5000 # milliseconds

Step 2: Configuring the model to handle concurrent requests

Go to Model Import -> Settings

Set the Container Concurrency to ‘desired_number’ if. e 4 and click on Update. You can set any value between 1 to 100.

Working with NFS - My Volumes Automatic Build on Inferless

Getting Started

Concepts

Integrations

API Reference

Model Import

Configuring Concurrent Requests

Sequential Processing with Queue

Batch Processing with Queue

Step 1: Preparing the model to handle concurrent requests

Step 2: Configuring the model to handle concurrent requests

Getting Started

Concepts

Integrations

API Reference

Model Import

​Sequential Processing with Queue

​Batch Processing with Queue

​Step 1: Preparing the model to handle concurrent requests

​Step 2: Configuring the model to handle concurrent requests

Sequential Processing with Queue

Batch Processing with Queue

Step 1: Preparing the model to handle concurrent requests

Step 2: Configuring the model to handle concurrent requests