inferless init and have the inferless.yaml before running this command.
Options:
--gpu TEXT: Denotes the machine type (A10/A100/T4). [required]--region TEXT: Inferless region. Defaults to Inferless default region.--beta: Deploys the model with v2 endpoints.--fractional: Use fractional machine type (default: dedicated).--runtime TEXT: Runtime name or file location. if not provided default Inferless runtime will be used.--volume TEXT: Volume name.--volume_mount_path TEXT: volume mount path.--env TEXT: Key=value pairs for model environment variables.--inference-timeout INTEGER: Inference timeout in seconds. [default: 180]--scale-down-timeout INTEGER: Scale down timeout in seconds. [default: 600]--container-concurrency INTEGER: Container concurrency level. [default: 1]--secret TEXT: Secret names to attach to the deployment.--runtimeversion TEXT: Runtime version (default: latest version of runtime).--max-replica INTEGER: Maximum number of replicas. [default: 1]--min-replica INTEGER: Minimum number of replicas. [default: 0]-c, --config TEXT: Inferless config file path to override from inferless.yaml [default: inferless.yaml]-t, --runtime-type TEXT: Type of runtime to deploy [fastapi, triton]. Defaults to triton. [default: triton]--help: Show this message and exit.