# Inferless ## Docs - [AWS PrivateLink - Inferless](https://docs.inferless.com/api-reference/aws-privatelink.md) - [Build Logs](https://docs.inferless.com/api-reference/debugging-model/build-logs.md) - [Call Logs](https://docs.inferless.com/api-reference/debugging-model/call-logs.md) - [Debugging your model with Logs](https://docs.inferless.com/api-reference/debugging-model/debugging-your-model-with-logs.md) - [Configuring the Model Settings](https://docs.inferless.com/api-reference/model-endpoint/configuring-the-model-settings.md): In Inferless, configuring the Scale Down, Inference Timeouts, and Container Concurrency settings is essential for optimizing performance and cost. Here’s an overview of what each setting does and how you can adjust them: - [Inferless Python Client](https://docs.inferless.com/api-reference/model-endpoint/inferless-python-client.md) - [Model Endpoint](https://docs.inferless.com/api-reference/model-endpoint/model-endpoint.md) - [Setting up Webhooks](https://docs.inferless.com/api-reference/model-endpoint/setting-up-webhooks.md) - [Test your model endpoint](https://docs.inferless.com/api-reference/model-endpoint/test-your-model-endpoint.md): Use a sample input to test your model before deployment - [Get Logs - API ](https://docs.inferless.com/api-reference/model-management-apis/model-logs-get.md): This endpoint gets the logs of a model. - [Model Settings - Update APIs](https://docs.inferless.com/api-reference/model-management-apis/model-settings-update.md): This endpoint updates the settings of a model. You can configure Min/Max Replicas, Timeout and Concurrency Settings - [Version Management](https://docs.inferless.com/api-reference/version-management.md) - [30th April 2025: Better Playground, Docker support and more ](https://docs.inferless.com/changelog/April-2025/30th-April.md) - [28th February 2025: Enhanced one click model deploy & faster CLI experience](https://docs.inferless.com/changelog/February-2025/28th-February.md) - [9th January 2025: Better Logs & Stability Fixes](https://docs.inferless.com/changelog/January-2025/9th-january.md) - [30th June 2025: Runtime Updates, Websockets and more ](https://docs.inferless.com/changelog/June-2025/30th-June.md) - [31st March 2025: New Dashboard UI, CLI Enhancements and Simplified Explore Models ](https://docs.inferless.com/changelog/March-2025/31st-March.md) - [31st May 2025: Runtime Flexibility, Faster Remote Run, and Hugging Face Improvements ](https://docs.inferless.com/changelog/May-2025/31st-May.md) - [15th April 2024: Runtime Flexibility, Build Efficiency, and Autoscaling Improvements](https://docs.inferless.com/changelog/april-2024/16th-april.md) - [8th April 2024: Workflow Optimization, Infrastructure Enhancements, and Runtime Updates](https://docs.inferless.com/changelog/april-2024/8th-april.md) - [18th December : Advanced Monitoring, Better Custom Runtime, and Enhanced Integration Stability](https://docs.inferless.com/changelog/december-2023/18th-december.md) - [22nd December: Enhanced Metrics, Improved Logging, and Advanced Model Support](https://docs.inferless.com/changelog/december-2023/22nd-december.md) - [4th December: UI Enhancements, Stable Builds, and Better Error Handling](https://docs.inferless.com/changelog/december-2023/4th-december.md) - [9th December 2024: CLI v2.0: Faster and Smoother Experience](https://docs.inferless.com/changelog/december-2024/9th-december.md) - [12th February 2024 - Enhanced Monitoring, Docker Flexibility, and One-click Model Deploy](https://docs.inferless.com/changelog/february-2024/12th-february.md) - [26th February 2024 - Better Exception Handling, Dynamic Batching Support and more.](https://docs.inferless.com/changelog/february-2024/26th-february.md) - [12th January 2024 - Enhanced Volume Management, Docker Integration, and Improved Billing Processes](https://docs.inferless.com/changelog/january-2024/12th-january.md) - [January 29, 2024 - Removal of I/O JSON, Webhook Support for Docker and Improved Runtime Management](https://docs.inferless.com/changelog/january-2024/29th-january.md) - [5th January - Faster Cold-starts, Security Upgrades, and Integration Efficiency](https://docs.inferless.com/changelog/january-2024/5th-january.md) - [July 16th Update - Inferless AI Chatbot, CLI Improvements, and 30% faster build times](https://docs.inferless.com/changelog/july-2024/16th-july.md) - [June 10th Update - Streaming APIs and Flexible Logging Options](https://docs.inferless.com/changelog/june-2024/10th-june.md) - [June 21st Update - Enhanced CLI Commands and Model Management APIs](https://docs.inferless.com/changelog/june-2024/21st-june.md) - [11th March 2024: Better Monitoring Tools and Enhanced User Control](https://docs.inferless.com/changelog/march-2024/11th-march.md) - [28th March 2024: Reducing model import time, better error handling](https://docs.inferless.com/changelog/march-2024/28-march.md) - [May 27th Update - Enhanced Runtime Management, AutoFix Suggestions, and Improved Infrastructure Stability](https://docs.inferless.com/changelog/may-2024/27th-may.md) - [6th May 2024: Enhanced Serverless Speeds, Model Build Efficiency, and Runtime Improvements](https://docs.inferless.com/changelog/may-2024/6th-may.md) - [10th November: Gitlab Integration, Secrets Manager & Better Billing](https://docs.inferless.com/changelog/november-2023/10th-november.md) - [17th November: Enhanced Error Handling and User Interface Improvements](https://docs.inferless.com/changelog/november-2023/17th-november.md) - [27th November: User Interface Enhancements and Reliability Improvements](https://docs.inferless.com/changelog/november-2023/27th-november.md) - [3rd November: Better Logs & Efficient Autoscaling](https://docs.inferless.com/changelog/november-2023/3rd-november.md) - [14th November 2024: Better Hugging Face Model Imports, Infrastructure Stability and Volume improvements](https://docs.inferless.com/changelog/november-2024/14th-november.md) - [20th November 2024: Enhance Performance Tracking, New Runtime UI and more](https://docs.inferless.com/changelog/november-2024/20th-november.md) - [20th October : Better error handling, Git fixes](https://docs.inferless.com/changelog/october-2023/20th-october.md) - [7th October 2024: Enhanced Model Imports, Build Tracking, and Real-Time Logs](https://docs.inferless.com/changelog/october-2024/7th-october.md) - [Changelog](https://docs.inferless.com/changelog/overview.md) - [30th September 2024: Remote Run, Infrastructure Stability, and Observability Improvements](https://docs.inferless.com/changelog/september-2024/30th-september.md) - [AI Agents CheatSheet](https://docs.inferless.com/cheatsheet/ai-agent-cheatsheet.md): A comprehensive cheatsheet covering the types of AI agents, their use cases, popular frameworks, LLMs, deployment options, essential tools, optimization techniques, common challenges, and ethical considerations. - [Code LLMs CheatSheet](https://docs.inferless.com/cheatsheet/code-cheatsheet.md): A comprehensive cheatsheet covering open-source code generation models, inference libraries, datasets, deployment strategies, and ethical considerations for developers and organizations. - [3D Generative Models CheatSheet](https://docs.inferless.com/cheatsheet/itt3d-cheatsheet.md): A comprehensive guide to open-source 3D generative models, datasets, toolkits, and resources for development, deployment, and evaluation. - [Text-to-Image Generation CheatSheet](https://docs.inferless.com/cheatsheet/text-to-image-cheatsheet.md): A comprehensive cheatsheet covering open-source text-to-image generation models, inference libraries, datasets, use cases, deployment strategies, training resources, evaluation methods, and ethical considerations for developers and organizations. - [Text-To-Speech (TTS) Cheatsheet](https://docs.inferless.com/cheatsheet/tts-cheatsheet.md): A comprehensive cheatsheet, provides an overview of the top open-source TTS models, inference libraries, training resources, and more to help you get started with or enhance your TTS projects. - [Vision-Language Models CheatSheet](https://docs.inferless.com/cheatsheet/vision-llm-cheatsheet.md): An all-in-one cheatsheet for vision-language models, including open-source models, inference toolkits, datasets, use cases, deployment strategies, optimization techniques, and ethical considerations for developers and organizations. - [Bring custom packages](https://docs.inferless.com/concepts/building-custom-images.md): Custom software and dependencies in your Runtime - [Handling Input / Output with Inferless](https://docs.inferless.com/concepts/configuring-the-input-output-schema.md) - [Dynamic Batching](https://docs.inferless.com/concepts/dynamic-batching.md) - [Managing Secrets on Inferless](https://docs.inferless.com/concepts/managing-secrets-on-inferless.md) - [Overview](https://docs.inferless.com/concepts/overview.md) - [Configuring Concurrent Requests](https://docs.inferless.com/concepts/processing-concurrent-requests.md): This will help you understand how to process multiple requests concurrently by the same replica. - [Remote Run: Run your code remotely](https://docs.inferless.com/concepts/remote-run.md) - [Automatic Build on Inferless](https://docs.inferless.com/concepts/setting-up-automatic-builds.md) - [Streaming with SSE Events](https://docs.inferless.com/concepts/streaming-with-sse.md) - [Working with Files on Inferless ](https://docs.inferless.com/concepts/working-with-files.md) - [Working with NFS - My Volumes](https://docs.inferless.com/concepts/working-with-nfs-volumes.md) - [Deploy and Run ComfyUI as an API on Inferless](https://docs.inferless.com/cookbook/comfyui-api-inferless.md): Welcome to an immersive tutorial that guides you through leveraging the power of ComfyUI's API capabilities and deploying your workflows on Inferless. This resource is designed to help you create and deploy custom workflows, extending ComfyUI's API functionality. You'll learn how to interact with Co… - [Build a Serverless Code Debugging Agent with Inferless](https://docs.inferless.com/cookbook/debugger-agent.md): In this tutorial, you’ll build a serverless Code Debugging Agent on Inferless that ingests Python or JavaScript code and returns a **deep analysis + a fully corrected version** production ready in minutes on Inferless. - [Build a Google Maps Agent using MCP & Inferless](https://docs.inferless.com/cookbook/google-map-agent-using-mcp.md): In this tutorial, you’ll build a serverless conversational agent that leverages Google Maps data via the Model Context Protocol (MCP), Inferless, Ollama and Langchain - [Build an Open-NotebookLM with Inferless](https://docs.inferless.com/cookbook/open-notebooklm.md): In this tutorial you’ll build a serverless Open-NotebookLM that turns any research paper or article into a lively, two-host audio podcast using Inferless. - [Build a Serverless Product Hunt Thread Summarizer](https://docs.inferless.com/cookbook/product-hunt-thread-summarizer.md): In this tutorial, we'll build a serverless Product Hunt thread summarizer using Large Language Models (LLMs). You'll learn how to scrape, process, and summarize Product Hunt threads using LLM into concise summaries, highlighting key insights. By creating this application, you'll help users save ti… - [Build a Serverless PDF Q&A Application in 10 Minutes](https://docs.inferless.com/cookbook/qna-serverless-pdf-application.md): Welcome to a hands-on tutorial designed to walk you through the creation of a PDF Q&A application, leveraging cutting-edge serverless technologies. In just 10 minutes, you'll have a working app capable of delivering precise answers from PDF documents, enriched with contextual understanding. - [Build a Serverless Customer Service Voicebot](https://docs.inferless.com/cookbook/serverless-customer-service-bot.md): Welcome to an engaging tutorial designed to walk you through creating a customer support voicebot where users can voice their queries and receive solutions. You'll learn to integrate speech recognition, large language, and text-to-speech models to develop a responsive and efficient voice-based custo… - [Create a Serverless Logo Generator Application](https://docs.inferless.com/cookbook/serverless-logo-generator.md): In this hands-on tutorial, you'll learn to build a serverless [Logo Generator application](https://github.com/inferless/Logo-Generator/tree/main) capable of creating unique logos based on text descriptions. Leveraging the power of diffusion models using the diffuser library, this application will al… - [Build a Serverless Book Audio Summary Generator](https://docs.inferless.com/cookbook/serverless-speech-book-summary.md): Welcome to this tutorial where we are creating an book summarizer using LLM and TTS. You'll learn how to use large language model(LLM) with text-to-speech model to process PDF books, extract key ideas, quotes, and actionable items, and convert them into engaging audio summaries. This application aim… - [Build a Serverless Voice Conversational Chatbot](https://docs.inferless.com/cookbook/serverless-voice-chatbot.md): Welcome to an immersive tutorial crafted to guide you through the development of a voice conversational chatbot application, leveraging state-of-the-art serverless technologies. Throughout this tutorial, you'll gain insights into seamlessly integrating multiple models within Inferless to construct a… - [Deploy Serverless Containers](https://docs.inferless.com/getting-started/deploy-containers.md): The mission is to make deployment for AI models simple and efficient. To accelerate this we provide a simple interface to run your custom model without worrying about infrastructure. - [Deploy a ML Model with Inferless](https://docs.inferless.com/getting-started/deploy-ml.md): There are several ways to import your model, but for the purpose of this example, we will be using Hugging Face. By the end of this tutorial, you will have the ability to deploy a Hugging Face model in Inferless. - [Deploy the DeepSeek-R1-Distill-Qwen-32B using Inferless](https://docs.inferless.com/how-to-guides/deploy-DeepSeek-R1-Distill-Qwen-32B.md): DeepSeek-R1-Distill-Qwen-32B is a distilled variant within the DeepSeek-R1 series. The dataset used for training is meticulously curated from the DeepSeek-R1 model, with Qwen2.5-32B serving as the foundational base model. This model has undergone supervised fine-tuning to achieve enhanced performanc… - [Deploy Qwen2-VL-7B-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-Qwen2-VL-7B-Instruct.md): Qwen2-VL-7B-Instruct is a 7-billion-parameter multimodal language model developed by Alibaba Cloud's Qwen team, designed for instruction-based tasks with advanced visual and multilingual capabilities. - [Deploy Qwen2.5-Coder-32B-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-Qwen2.5-Coder-32B-Instruct.md): Qwen2.5-Coder-32B-Instruct is a 32.5-billion-parameter code-specific language model developed by Alibaba Cloud's Qwen team, designed for instruction-based tasks with support for function calling and a context length of up to 131,072 tokens. - [Deploy Llama-3.1-8B-Instruct GGUF using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-Llama-3.1-8B-Instruct-GGUF-using-inferless.md): Llama-3.1-8B-Instruct GGUF is a quantized version of Meta's state-of-the-art Llama-3.1 series of large language models. This guide will take you through the deployment process of the GGUF model on the Inferless platform. - [Deploy Llama-3.1-8B-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-Llama-3.1-8B-Instruct-using-inferless.md): Llama-3.1-8B-Instruct is a new state-of-the-art model from Meta's Lama-3.1 series of large language models. The repository is for the Llama-3.1-8B-Instruct model for deploying the model in the Inferless platform. - [Deploy Llama-3.2-11B-Vision-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-Llama-3.2-11B-Vision-Instruct-using-inferless.md): The Llama 3.2 11B Vision Instruct model is part of Meta's latest series of large language models that introduce significant advancements in multimodal AI capabilities, allowing for both text and image inputs. - [Deploy Ministral-8B-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-Ministral-8B-Instruct.md): Ministral-8B-Instruct is a high-performance, instruction-tuned language model with 8 billion parameters and a 128k token context window, designed for versatile applications in natural language processing. - [Deploy the Qwen's QwQ-32B-Preview using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-Qwen-QwQ-32B-preview.md): QwQ-32B-Preview is an experimental research model developed by the Qwen Team, featuring 32.5 billion parameters and a 32,768-token context window, designed to advance AI reasoning capabilities. - [Deploy a CodeLlama-Python-34B Model using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-codellama-python-34b-model-using-inferless.md): In this tutorial, we'll show the deployment process of a quantized GPTQ model using vLLM. We are deploying a GPTQ, 4-bit quantized version of the codeLlama-Python-34B model. - [Deploy Qwen2-72B-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-qwen2-72b-using-inferless.md): Qwen2-72B-Instruct is a part of the Qwen2 series of large language models ranging from 0.5 to 72 billion parameters. The repository is for the 72B instruction-tuned model for deploying the model in the Inferless platform. - [Deploy Qwen2.5-Omni-7B using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-qwen2.5-omni-7b.md): Qwen2.5-Omni-7B is a 7B multimodal language model developed by Alibaba Cloud's Qwen team, designed for real-time, end-to-end processing of text, images, audio, and video inputs, with text and speech generation capabilities. - [Deploy the Qwen3-8B using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-qwen3-8b.md): Qwen3-8B is a language model from Alibaba Cloud's Qwen3 series that delivers strong reasoning, multilingual and agent-friendly performance while remaining inexpensive to host. - [Deploy Whisper-large-v3-turbo using Inferless](https://docs.inferless.com/how-to-guides/deploy-a-whisper-large-v3-turbo.md): Whisper-large-v3-turbo is a fast and efficient automatic speech recognition model with 809 million parameters, optimized for transcription and translation. - [Deploy CodeLlama 70B using Inferless](https://docs.inferless.com/how-to-guides/deploy-codellama-70b-using-inferless.md): This tutorial demonstrates deploying a quantized CodeLlama 70B model using vLLM. We will be deploying a 4-bit quantized GPTQ version of the codeLlama-Python-70B model. - [Deploy Deci 7B using Inferless](https://docs.inferless.com/how-to-guides/deploy-deci-7b-using-inferless.md): DeciLM-7B, a text generation model with 7.04 billion parameters, that leads the 7B base language models during its release - [Deploy the DeepSeek-R1-Qwen3-8B using Inferless](https://docs.inferless.com/how-to-guides/deploy-deepseek-qwen3-8b.md): DeepSeek-R1-Qwen3-8B is a distilled model that transfers the chain-of-thought reasoning skills of DeepSeek-R1-0528 into the lighter Qwen 3 backbone, delivering state-of-the-art math, code and logic performance while remaining inexpensive to host. - [Deploy FLUX.1-schnell using Inferless](https://docs.inferless.com/how-to-guides/deploy-flux-schnell-using-inferless.md): Black Forest Labs has released FLUX.1-schnell, part of the FLUX.1 suite of text-to-image models that set a new state-of-the-art in image detail, prompt adherence, style diversity, and scene complexity. FLUX.1-schnell is the fastest model in the suite, tailored for local development and personal use. - [Deploy the Gemma-3-27B-it using Inferless](https://docs.inferless.com/how-to-guides/deploy-gemma-27b-it.md): Gemma-3-27B-it is a 27-billion-parameter multimodal language model developed by the Gemma team. This model excels in instruction-based tasks, offering superior visual and multilingual capabilities. - [Deploy Gemma-7B using vLLM on Inferless](https://docs.inferless.com/how-to-guides/deploy-gemma-7b-using-vllm-on-inferless.md): Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. - [Deploy Google PaliGemma-3B using Inferless](https://docs.inferless.com/how-to-guides/deploy-google-paligemma-3b-using-inferless.md): PaliGemma is a cutting-edge open vision-language model (VLM) developed by Google. It is designed to understand and generate detailed insights from both images and text, making it a powerful tool for tasks such as image captioning, visual question answering, object detection, and object segmentation. - [Deploy Meta-Llama-3-8B using Inferless](https://docs.inferless.com/how-to-guides/deploy-llama-3-using-inferless.md): Llama 3 is an auto-regressive language model, leveraging a refined transformer architecture.It incorporate supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to ensure alignment with human preferences. - [Deploy Meditron using Inferless](https://docs.inferless.com/how-to-guides/deploy-meditron-using-inferless.md) - [Deploy the Mistral-Small-3.1-24B-Instruct-2503 using Inferless](https://docs.inferless.com/how-to-guides/deploy-mistral-3.1-24b-instruct.md): Mistral-Small-3.1-24B-Instruct-2503 is a 24-billion-parameter language model fine-tuned for instruction-following tasks and equipped with state-of-the-art vision understanding. Optimized for efficient inference and function calling, it is ideal for fast-response conversational agents, local inferenc… - [Deploy the Mistral-7B-Instruct-v0.3 using Inferless](https://docs.inferless.com/how-to-guides/deploy-mistral-7b-instruct-v0.3.md): Mistral-7B-Instruct-v0.3 is a 7.3-billion-parameter language model fine-tuned for instruction-following tasks. This model supports function calling and is optimized for efficient inference, making it suitable for a wide range of applications. - [Deploy Mixtral-8x7B for 52 Tokens/Sec on a Single GPU](https://docs.inferless.com/how-to-guides/deploy-mixtral-8x7b-for-52-tokens-sec-on-a-single-gpu.md): Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference - [Deploy Mixtral-8x7B using Inferless](https://docs.inferless.com/how-to-guides/deploy-mixtral-8x7b-using-inferless.md): Mixtral 8x7B, a sparse mixture of experts (SMoE) model with open weights, outperforms Llama 2 70B on benchmarks. It excels as the strongest open-weight model, displaying superior cost/performance. - [Deploy Musicgen Stereo Melody Large Model using Inferless](https://docs.inferless.com/how-to-guides/deploy-musicgen-melody-large.md): Meta releases [MusicGen](https://audiocraft.metademolab.com/musicgen.html), a text-to-music model that converts text descriptions or audio prompts into high-quality music samples. - [Deploy the Nanonets-OCR-s model using Inferless](https://docs.inferless.com/how-to-guides/deploy-nanonets-ocr-s.md): An vision-language OCR model fine-tuned from Qwen 2.5-VL-3B that turns documents and images into structured Markdown including tables, LaTeX equations, check-boxes and tagged watermarks, ready for downstream LLM workflows. - [Deploy the OpenAI's GPT-OSS 20B model using Inferless](https://docs.inferless.com/how-to-guides/deploy-openai-gpt-oss-20b.md): An open-weight, 21B parameter language model optimized for chain-of-thought reasoning, tool use, and agentic workflows with structured outputs. - [Deploy OpenHermes using Inferless](https://docs.inferless.com/how-to-guides/deploy-openhermes-using-inferless.md): OpenHermes 2.5 Mistral 7B is a state-of-the-art Mistral Fine-tune, a continuation of the OpenHermes 2 model, which is trained on additional code datasets. - [Deploy OpenLLM-leaderboard topper Smaug-72B using Inferless](https://docs.inferless.com/how-to-guides/deploy-openllm-leaderboard-topper-smaug-72b-using-inferless.md): This tutorial demonstrates deploying a quantized Smaug-72B model using vLLM. We will be deploying a 4-bit quantized GPTQ version of this model. - [Deploy Phi-3-mini-128k-instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-phi-3-128k.md): Phi-3-mini-128k-instruct is a 3.8 billion-parameter lightweight state-of-the-art model fine-tuned for instruction-following tasks, leveraging advanced techniques and comprehensive datasets to deliver high performance in natural language understanding and generation. - [Deploy the Phi-4 using Inferless](https://docs.inferless.com/how-to-guides/deploy-phi-4.md): Phi-4 is Microsoft's latest 14 billion parameters small language model (SLM). This model is part of the Phi family, which aims to balance between model size and performance, showcasing that smaller models can achieve state-of-the-art results. - [Deploy the Phi-4-Multimodal-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-phi-4-multimodal-instruct.md): Phi-4-Multimodal-Instruct is a 5.6-billion-parameter multimodal language model from Microsoft that integrates text, vision, and audio processing. This model excels in instruction-based tasks, offering advanced reasoning and cross-modal capabilities. - [Deploy Quantized version of SOLAR 10.7B-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-quantized-version-of-solar-10-7b-instruct-using-inferless.md): SOLAR-10.7B, an advanced large language model (LLM) with 10.7 billion parameters, demonstrates superior performance in various natural language processing (NLP) tasks - [Deploy the Qwen2.5-VL-7B-Instruct using Inferless](https://docs.inferless.com/how-to-guides/deploy-qwen2.5-vl-7b.md): Qwen2.5-VL-7B-Instruct is a 7-billion-parameter multimodal language model developed by Alibaba Cloud's Qwen team. This model excels in instruction-based tasks, offering advanced visual and multilingual capabilities. - [Deploy Stable Cascade using Inferless](https://docs.inferless.com/how-to-guides/deploy-stable-cascade-using-inferless.md): Stable Cascade distinguishes itself by operating within a significantly smaller latent space, offering faster inference and cost-effective training. - [Deploy Stable Diffusion 3 using Inferless](https://docs.inferless.com/how-to-guides/deploy-stable-diffusion-3-using-inferless.md): Stability AI has released Stable Diffusion 3, an advanced text-to-image generation model with significant improvements over its predecessors. This new version features a range of models from 800M to 8B parameters, providing users with scalable options to suit their needs. - [Deploy Stable Diffusion XL Turbo using Inferless](https://docs.inferless.com/how-to-guides/deploy-stable-diffusion-xl-turbo-using-inferless.md): Stability AI unveiled SDXL Turbo, a technology that facilitates high-quality image generation in just one step, utilizing an advanced distillation technique known as Adversarial Diffusion Distillation - [Deploy Stable Video Diffusion using Inferless](https://docs.inferless.com/how-to-guides/deploy-stable-video-diffusion-using-inferless.md): Stability AI released Stable Video Diffusion, a latent diffusion model for high-resolution video generation from text and images. - [Deploy Starling 7B using Inferless](https://docs.inferless.com/how-to-guides/deploy-starling-7b-using-inferless.md): Starling 7B is an LLM trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench - [Deploy Llama-3-TenyxChat-70B using Inferless](https://docs.inferless.com/how-to-guides/deploy-tenyx-llama-3-using-inferless.md): Llama-3-TenyxChat-70B is a model fine-tuned through Direct Preference Optimization (DPO). It leverages Tenyx's advance fine-tuning technology and the open-source AI feedback dataset, UltraFeedback, for its training. - [Deploy TenyxChat 7B using Inferless](https://docs.inferless.com/how-to-guides/deploy-tenyxchat-7b-using-inferless.md): TenyxChat-7B-v1, is trained using the Direct Preference Optimization (DPO) framework on the open-source AI feedback dataset UltraFeedback - [Deploy TenyxChat-8x7B-v1 using Inferless](https://docs.inferless.com/how-to-guides/deploy-tenyxchat-8x7b-v1-using-inferless.md): TenyxChat-8x7B-v1, is trained using the Direct Preference Optimization (DPO) framework on the open-source AI feedback dataset UltraFeedback - [How to Stream Speech with Parler-TTS using Inferless](https://docs.inferless.com/how-to-guides/deploy-text-to-speech-streaming.md): This tutorial demonstrates how to implement real-time text-to-speech (TTS) streaming using the parler_tts_mini model and Parler-TTS library. - [Deploy Google TimesFM using Inferless](https://docs.inferless.com/how-to-guides/deploy-timesfm-using-inferless.md): TimesFM is a cutting-edge time series forecasting model developed by Google. It is designed to understand and generate detailed forecasts based on temporal data, making it a powerful tool for tasks such as demand forecasting, anomaly detection, and trend analysis. - [Deploy the Voxtral-Mini-3B model using Inferless](https://docs.inferless.com/how-to-guides/deploy-voxtral-3b-mini.md): An audio-language model fine-tuned for transcription, summarization, Q&A and voice-triggered function calls, deployable compactly on consumer GPUs with rich structured outputs. - [Deploy Whisper Large V3 using Inferless](https://docs.inferless.com/how-to-guides/deploy-whisper-large-v3-using-inferless.md): OpenAI releases Whisper-large-v3, a pre-trained model for automatic speech recognition (ASR) and speech translation - [Deploy the YOLO11m model using Inferless](https://docs.inferless.com/how-to-guides/deploy-yolo11m-detect.md): YOLO11m is a medium-sized variant of the YOLO11 family, designed to balance accuracy and computational efficiency for object detection tasks. - [How to Finetune and Inference Llama-3](https://docs.inferless.com/how-to-guides/how-to-finetune--and-inference-llama3.md): Llama 3 is an auto-regressive language model, leveraging a refined transformer architecture. The Llama 3 models were trained on 8x more data on over 15 trillion tokens. It has a context length of 8K tokens and increases the vocabulary size of the tokenizer to 128,256 (from 32K tokens in the previous… - [How to Finetune, Quantize and Inference Phi-2](https://docs.inferless.com/how-to-guides/how-to-finetune--quantize-and-inference-phi-2.md): Phi-2 is a Transformer with 2.7 billion parameters which showcased a nearly state-of-the-art performance among models with less than 13 billion parameters - [Model Alerts using AWS SNS](https://docs.inferless.com/integrations/aws-sns/aws-sns.md): You can integrate with Inferless for alerts by using SNS to send notifications about critical events related to model health. - [Cloud Buckets - S3/ GCS](https://docs.inferless.com/integrations/cloud-buckets/cloud-buckets---s3--gcs.md) - [Demo GCS](https://docs.inferless.com/integrations/cloud-buckets/demo-gcs.md) - [Demo S3](https://docs.inferless.com/integrations/cloud-buckets/demo-s3.md) - [Docker](https://docs.inferless.com/integrations/docker.md): Bring your own docker container images. ( This might have higher coldstarts ) - [File Import from System](https://docs.inferless.com/integrations/file-import-system/file-import-from-system.md): You can upload your model file directly from your system or from a public downloadable link. - [Import from system](https://docs.inferless.com/integrations/file-import-system/import-from-system.md) - [Git (Custom Code)](https://docs.inferless.com/integrations/git-custom-code/git--custom-code.md) - [GitHub - Demo](https://docs.inferless.com/integrations/git-custom-code/github---demo.md) - [GitLab - Demo](https://docs.inferless.com/integrations/git-custom-code/gitlab---demo.md) - [Hugging face](https://docs.inferless.com/integrations/hugging-face.md) - [Introduction](https://docs.inferless.com/introduction/introduction.md) - [Automatic Build via webhooks](https://docs.inferless.com/model-import/automatic-build/automatic-build-via-webhooks.md) - [AWS Sagemaker](https://docs.inferless.com/model-import/automatic-build/aws-sagemaker.md): How to enable web-hooks in AWS for activating auto-rebuilt/CI-CD function in Inferless - [Docker](https://docs.inferless.com/model-import/automatic-build/docker.md) - [Github](https://docs.inferless.com/model-import/automatic-build/github.md) - [Google Vertex AI](https://docs.inferless.com/model-import/automatic-build/google-vertex-ai.md) - [Hugging Face](https://docs.inferless.com/model-import/automatic-build/hugging-face.md) - [Bring custom packages](https://docs.inferless.com/model-import/bring-custom-packages.md): Custom software and dependencies in your Runtime - [Cli import](https://docs.inferless.com/model-import/cli-import.md) - [Configuring the Inference Service](https://docs.inferless.com/model-import/configuring-the-inference-service.md): This guide will help you to understand how you can configure the Inference Service using CLI and Platform - [File structure requirements](https://docs.inferless.com/model-import/file-structure-req/file-structure-requirements.md) - [Import using CLI](https://docs.inferless.com/model-import/file-structure-req/import-using-cli.md): Deploying models using CLI required files in a particular format. This guide will explain about all the required files - [Import using Docker](https://docs.inferless.com/model-import/file-structure-req/import-using-docker.md): Bring your own image - [Importing from Cloud Buckets](https://docs.inferless.com/model-import/file-structure-req/importing-from-cloud-buckets.md) - [Importing from Github](https://docs.inferless.com/model-import/file-structure-req/importing-from-github.md) - [Importing your file from your System](https://docs.inferless.com/model-import/file-structure-req/importing-your-file-from-your-system.md) - [Input / Output Schema](https://docs.inferless.com/model-import/input-output-schema.md) - [My Secrets](https://docs.inferless.com/model-import/my-secrets.md) - [My Volumes](https://docs.inferless.com/model-import/my-volumes.md) - [Input / Output Schema](https://docs.inferless.com/references/api/inferless-input-schema.md) - [InferlessPythonModel Class](https://docs.inferless.com/references/api/inferless-python.md) - [inferless deploy](https://docs.inferless.com/references/cli/inferless-deploy.md) - [inferless export](https://docs.inferless.com/references/cli/inferless-export.md) - [inferless init](https://docs.inferless.com/references/cli/inferless-init.md) - [inferless integration](https://docs.inferless.com/references/cli/inferless-integration.md) - [inferless login](https://docs.inferless.com/references/cli/inferless-login.md) - [inferless model](https://docs.inferless.com/references/cli/inferless-model.md) - [inferless remote run](https://docs.inferless.com/references/cli/inferless-remote-run.md) - [inferless run](https://docs.inferless.com/references/cli/inferless-run.md) - [inferless runtime](https://docs.inferless.com/references/cli/inferless-runtime.md) - [inferless secrets](https://docs.inferless.com/references/cli/inferless-secrets.md) - [inferless token](https://docs.inferless.com/references/cli/inferless-token.md) - [inferless volumes](https://docs.inferless.com/references/cli/inferless-volume.md) - [inferless workspace](https://docs.inferless.com/references/cli/inferless-workspace.md) - [References](https://docs.inferless.com/references/overview.md) ## OpenAPI Specs - [openapi](https://docs.inferless.com/api-reference/openapi.json) ## Optional - [Blog](https://www.inferless.com/blog)