inferless init and have the inferless.yaml before running this command.
Options:
- --gpu TEXT: Denotes the machine type (A10/A100/T4). [required]
- --region TEXT: Inferless region. Defaults to Inferless default region.
- --beta: Deploys the model with v2 endpoints.
- --fractional: Use fractional machine type (default: dedicated).
- --runtime TEXT: Runtime name or file location. if not provided default Inferless runtime will be used.
- --volume TEXT: Volume name.
- --volume_mount_path TEXT: volume mount path.
- --env TEXT: Key=value pairs for model environment variables.
- --inference-timeout INTEGER: Inference timeout in seconds. [default: 180]
- --scale-down-timeout INTEGER: Scale down timeout in seconds. [default: 600]
- --container-concurrency INTEGER: Container concurrency level. [default: 1]
- --secret TEXT: Secret names to attach to the deployment.
- --runtimeversion TEXT: Runtime version (default: latest version of runtime).
- --max-replica INTEGER: Maximum number of replicas. [default: 1]
- --min-replica INTEGER: Minimum number of replicas. [default: 0]
- -c, --config TEXT: Inferless config file path to override from inferless.yaml [default: inferless.yaml]
- -t, --runtime-type TEXT: Type of runtime to deploy [fastapi, triton]. Defaults to triton. [default: triton]
- --help: Show this message and exit.