This will help you understand how to process multiple requests concurrently by the same replica.
Inferless allows you to process multiple requests concurrently by the same replica. This can help you improve the throughput of your model and handle multiple requests simultaneously. In this guide, we’ll walk you through the steps to configure your model to handle concurrent requests.There are 2 ways to configure concurrent requests in Inferless:
This is the simplest way to process multiple requests with the same replica. In this method, the requests are processed sequentially by the same replica. This is useful when you have task that takes less time to process.To configure this you can go to Model Import -> SettingsSet the Container Concurrency to ‘desired_number’ and click on Update. You can set any value between 1 to 100.
This method processes the requests in batches by the same replica. This is useful when you have tasks that take longer to process and want to process multiple requests simultaneously.