An vision-language OCR model fine-tuned from Qwen 2.5-VL-3B that turns documents and images into structured Markdown including tables, LaTeX equations, check-boxes and tagged watermarks, ready for downstream LLM workflows.
<img>
elements—producing outputs that are ready for downstream LLM or RAG pipelines.
Under the hood, Nanonets-OCR-s is fine-tuned from the Qwen 2.5-VL-3B-Instruct backbone, inheriting that model’s strong multimodal reasoning and layout-aware capabilities. This choice gives the OCR system a compact size that still fits on a single consumer GPU while reaching state-of-the-art accuracy on complex documents. Community posts and the official announcement highlight that the entire 3 B stack is released under the Apache-2.0 licence, making it free to self-host, fine-tune or embed in commercial workflows.
model.py
.
inferless
Python client and Pydantic, you can define structured schemas directly in your code for input and output, eliminating the need for external file.
str
, float
, int
, bool
etc.
These type annotations specifys what type of data each field should contain.
The default
value serves as the example input for testing with the infer
function.
@inferless.response
decorator helps you define structured output schemas.
infer
FunctionRequestObjects
as input,
and returns a ResponseObjects
instance as output, ensuring the results adhere to a defined structure.
def initialize
: In this function, you will initialize your model and define any variable
that you want to use during inference.
def infer
: This function gets called for every request that you send. Here you can define all the steps that are required for the inference.
def finalize
: This function cleans up all the allocated memory.
inferless remote-run
(installation guide here) command to test your model or any custom Python script in a remote GPU environment directly from your local machine. Make sure that you use Python3.10
for seamless experience.
inferless
library and initialize Cls(gpu="A10")
. The available GPU options are T4
, A10
and A100
.initialize
and infer
functions with @app.load
and @app.infer
respectively.my_local_entry
) with @inferless.local_entry_point
.
Within this function, instantiate your model class, convert any incoming parameters into a RequestObjects
object, and invoke the model’s infer
method.app.py
and your inferless-runtime-config.yaml
and run:
--confidence_threshold
, etc.) as long as your code expects them in the inputs
dictionary.
If you want to exclude certain files or directories from being uploaded, use the --exclude
or -e
flag.
Add a custom model
button that you see on the top right. An import wizard will open up.
--gpu A10
: Specifies the GPU type for deployment. Available options include A10
, A100
, and T4
.--runtime inferless-runtime-config.yaml
: Defines the runtime configuration file. If not specified, the default Inferless runtime is used.