Skip to main content
This command will deploy the model to the inferless server. You should have run the command inferless init and have the inferless.yaml before running this command.

Options:

  • --gpu TEXT: Denotes the machine type (A10/A100/T4). [required]
  • --region TEXT: Inferless region. Defaults to Inferless default region.
  • --beta: Deploys the model with v2 endpoints.
  • --fractional: Use fractional machine type (default: dedicated).
  • --runtime TEXT: Runtime name or file location. if not provided default Inferless runtime will be used.
  • --volume TEXT: Volume name.
  • --volume_mount_path TEXT: volume mount path.
  • --env TEXT: Key=value pairs for model environment variables.
  • --inference-timeout INTEGER: Inference timeout in seconds. [default: 180]
  • --scale-down-timeout INTEGER: Scale down timeout in seconds. [default: 600]
  • --container-concurrency INTEGER: Container concurrency level. [default: 1]
  • --secret TEXT: Secret names to attach to the deployment.
  • --runtimeversion TEXT: Runtime version (default: latest version of runtime).
  • --max-replica INTEGER: Maximum number of replicas. [default: 1]
  • --min-replica INTEGER: Minimum number of replicas. [default: 0]
  • -c, --config TEXT: Inferless config file path to override from inferless.yaml [default: inferless.yaml]
  • -t, --runtime-type TEXT: Type of runtime to deploy [fastapi, triton]. Defaults to triton. [default: triton]
  • --help: Show this message and exit.

Usage:

$ inferless deploy [OPTIONS]
Once deployed you will be able to see the model import id in the terminal. You can check the progress of the model in Dashboard

Example:

$ inferless deploy --gpu T4 --runtime ./inferless-runtime-config.yaml
To redeploy the model with new code.
inferless model rebuild --model-id <model_id> -l
I