Model Management APIs
Model Settings - Update APIs
Getting Started
Concepts
- Overview
- Cli import
- Handling Input / Output with Inferless
- Handling variable input array
- Handling variable output array
- Bring custom packages
- Working with Files on Inferless
- Working with NFS - My Volumes
- Configuring Concurrent Requests
- Dynamic Batching
- Streaming with SSE Events
- Automatic Build on Inferless
- Managing Secrets on Inferless
Integrations
- Hugging face
- Git (Custom Code)
- Docker
- Cloud Buckets - S3/ GCS
- File Import from System
API Reference
- AWS PrivateLink - Inferless
- Remote Run: Run your code remotely
- Model Endpoint
- Debugging your Model with Logs
- Version Management
- Model Management APIs
Model Import
- File Structure Requirements
- Input / Output Schema
- Bring custom packages
- Automatic Build via webhooks
- Configuring the Inference Service
- My Volumes
- My Secrets
Model Management APIs
Model Settings - Update APIs
This endpoint updates the settings of a model. You can configure Min/Max Replicas, Timeout and Concurrency Settings
POST
/
rest
/
model
/
settings
/
update
curl --location 'https://api.inferless.com/rest/model/settings/update/' \
--header 'Authorization: <workspace-token>' \
--header 'Content-Type: application/json' \
--data '{
"model_id": "<model-id>",
"data": {
"min_replica": 0,
"max_replica": 2,
"scale_down_delay": 30,
"inference_time": 120,
"is_dedicated": false,
"machine_type": "T4",
"container_concurrency": 10,
"is_input_output_enabled": false
}
}'
{
"status": "success",
"details": "Model updated successfully"
}
Authorizations
Your workspace API token. You can find it in Workspace Settings
Body
The ID of the model whose settings you want to update.
The settings you want to update for the model.
The minimum number of replicas for the model.
The maximum number of replicas for the model.
The delay in seconds before scaling down the model.
The maximum time in seconds for the model to process an inference request.
Whether the model uses a dedicated machine or a shared machine.
The machine type for the model.
The number of concurrent requests the model can handle.
Whether the model supports input and output tracking.
curl --location 'https://api.inferless.com/rest/model/settings/update/' \
--header 'Authorization: <workspace-token>' \
--header 'Content-Type: application/json' \
--data '{
"model_id": "<model-id>",
"data": {
"min_replica": 0,
"max_replica": 2,
"scale_down_delay": 30,
"inference_time": 120,
"is_dedicated": false,
"machine_type": "T4",
"container_concurrency": 10,
"is_input_output_enabled": false
}
}'
{
"status": "success",
"details": "Model updated successfully"
}
curl --location 'https://api.inferless.com/rest/model/settings/update/' \
--header 'Authorization: <workspace-token>' \
--header 'Content-Type: application/json' \
--data '{
"model_id": "<model-id>",
"data": {
"min_replica": 0,
"max_replica": 2,
"scale_down_delay": 30,
"inference_time": 120,
"is_dedicated": false,
"machine_type": "T4",
"container_concurrency": 10,
"is_input_output_enabled": false
}
}'
{
"status": "success",
"details": "Model updated successfully"
}