DeepSeek-R1-Qwen3-8B is a distilled model that transfers the chain-of-thought reasoning skills of DeepSeek-R1-0528 into the lighter Qwen 3 backbone, delivering state-of-the-art math, code and logic performance while remaining inexpensive to host.
model.py
.
inferless
Python client and Pydantic, you can define structured schemas directly in your code for input and output, eliminating the need for external file.
str
, float
, int
, etc.
These type annotations specifys what type of data each field should contain.
The default
value serves as the example input for testing with the infer
function.
@inferless.response
decorator helps you define structured output schemas.
infer
FunctionRequestObjects
as input,
and returns a ResponseObjects
instance as output, ensuring the results adhere to a defined structure.
def initialize
: In this function, you will initialize your model and define any variable
that you want to use during inference.
def infer
: This function gets called for every request that you send. Here you can define all the steps that are required for the inference.
def finalize
: This function cleans up all the allocated memory.
inferless remote-run
(installation guide here) command to test your model or any custom Python script in a remote GPU environment directly from your local machine. Make sure that you use Python3.10
for seamless experience.
inferless
library and initialize Cls(gpu="A100")
. The available GPU options are T4
, A10
and A100
.initialize
and infer
functions with @app.load
and @app.infer
respectively.my_local_entry
) with @inferless.local_entry_point
.
Within this function, instantiate your model class, convert any incoming parameters into a RequestObjects
object, and invoke the model’s infer
method.app.py
and your inferless-runtime-config.yaml
and run:
--temperature
, --max_new_tokens
, etc.) as long as your code expects them in the inputs
dictionary.
If you want to exclude certain files or directories from being uploaded, use the --exclude
or -e
flag.
Add a custom model
button that you see on the top right. An import wizard will open up.
--gpu A100
: Specifies the GPU type for deployment. Available options include A10
, A100
, and T4
.--runtime inferless-runtime-config.yaml
: Defines the runtime configuration file. If not specified, the default Inferless runtime is used.