CLI
inferless model
You can use this to manage your models.
Commands
list
: List all models in the current workspacedelete
: delete a model from the system.rebuild
: This deploys the new code and runtime for the model.info
: Get model details (min replicas, max replicas, current replicas, status).activate
: activate a model this will restore the min and max replicas to the original values.deactivate
: deactivate a model this will scale the min and max replicas to 0.patch
: patch model configuration.
Example
Below command displays all the models in the workspace with their details.
Below command rebuilds a model.
Options:
--model-id
: Model ID--runtime-path (optional)
: Runtime file path which will be created as new version for your current runtime.--runtime-version
: new runtime version
for Local rebuild
Below command deletes a model.
Below command displays the details of a specific model.
Select the model you want to get details for: ‘type the name’
Output: you will get the ‘Name’, ‘ID’ and ‘URL’
Options:
--model-id <id>
: Model ID--help
: Show this message and exit.
Below command activates a model.
Below command deactivates a model.
patch model configuration.
Usage:
Options:
--model-id TEXT
: Model ID--gpu TEXT
: Denotes the machine type (A10/A100/T4). [required]--fractional
: Use fractional machine type (default: dedicated).--volume TEXT
: Volume name.--mount-path TEXT
: Volume Mount path for the volume.--env TEXT
: Key=value pairs for model environment variables.--inference-timeout INTEGER
: Inference timeout in seconds. [default: 180]--scale-down-timeout INTEGER
: Scale down timeout in seconds. [default: 600]--container-concurrency INTEGER
: Container concurrency level. [default: 1]--secret TEXT
: Secret names to attach to the deployment (—secret secret-name).--runtimeversion TEXT
: Runtime version (default: latest).--max-replica INTEGER
: Maximum number of replicas. [default: 1]--min-replica INTEGER
: Minimum number of replicas. [default: 0]--help
: Show this message and exit.