Build a Serverless Code Debugging Agent with Inferless

Key Components of the Application

LLM: Qwen/Qwen2.5-Coder-7B-Instruct, an instruction-tuned code model designed for debugging and code generation.
Prompt Engineering: Defines language-specific strict system and user prompts that drive a consistent, structured Markdown report + corrected code.

Crafting Your Application

The request flow is simple:

User sends the code_content along with the code_language.
Prompt selection based on code_language maps to Python or JavaScript system/user prompts.
Chat template builds the final input to the model.
Generation produces a structured Markdown analysis with: Analysis Summary, Critical Issues, Warnings, Suggestions, Best Practices Applied, and Complete Corrected Code.
Response returns generated_text containing the full report and the improved code block.

Core Development Steps

1. Build the complete Pipeline

We will create two script, first app.py will have the inferless class with it’s functions and the second prompts.py will have the prompts required for the code analysis.

app.py:

from transformers import set_seed,AutoModelForCausalLM, AutoTokenizer
import torch
import random
import numpy as np
import inferless
from typing import Optional
from pydantic import BaseModel, Field
from prompts import PY_USER_PROMPT, PY_SYSTEM_PROMPT, JS_USER_PROMPT, JS_SYSTEM_PROMPT

@inferless.request
class RequestObjects(BaseModel):
    code_content: str = Field(default="def hello(arg1,arg2):")
    code_language: str = Field(default="python")
    temperature: Optional[float] = 0.1
    top_p: Optional[float] = 0.9
    max_new_tokens: Optional[int] = 4096
    do_sample: Optional[bool] = True

@inferless.response
class ResponseObjects(BaseModel):
    generated_text: str = Field(default="Generated text will appear here")

class InferlessPythonModel:
    def set_seed(self,SEED):
      random.seed(SEED)
      np.random.seed(SEED)
      torch.manual_seed(SEED)
      torch.cuda.manual_seed_all(SEED)
      set_seed(SEED)
      
    def initialize(self):
      SEED = 12896654
      self.set_seed(SEED)
      
      model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"
      self.model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto")
      self.tokenizer = AutoTokenizer.from_pretrained(model_name)
      
      self.LANG_PROMPTS = {"python":(PY_SYSTEM_PROMPT, PY_USER_PROMPT),
                      "javascript":  (JS_SYSTEM_PROMPT, JS_USER_PROMPT), 
                     }
    def infer(self, inputs: RequestObjects) -> ResponseObjects:
      SYSTEM_PROMPT, USER_PROMPT = self.LANG_PROMPTS[inputs.code_language.lower()]
      
      messages = [ {"role": "system", "content": SYSTEM_PROMPT},
                  {"role": "user", "content": USER_PROMPT.format(code_content=inputs.code_content)}
                 ]
      
      text = self.tokenizer.apply_chat_template(
          messages,
          tokenize=False,
          add_generation_prompt=True
      )
      model_inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device)
      generated_ids = self.model.generate(
          **model_inputs,
          max_new_tokens=inputs.max_new_tokens,
          temperature=inputs.temperature,
          do_sample=inputs.do_sample,
          top_p=inputs.top_p
      )
        
      generated_ids = [
          output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
      ]
      response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
      return ResponseObjects(generated_text=response)

    def finalize(self):
      self.model = None

prompts.py

PY_USER_PROMPT = """Please analyze the following Python code for bugs, potential issues, and improvements:
```python
{code_content}
```
Provide a comprehensive analysis including:
1. All critical bugs and warnings
2. Performance optimizations
3. Python best practices that should be applied
4. A complete corrected version implementing all fixes and best practices

Remember to include both the 🏆 **Best Practices Applied** section explaining the best practices implemented, and the ✅ **Complete Corrected Code** section with the fully improved code."""

PY_SYSTEM_PROMPT = """You are an expert Python debugger and code analyst. Your task is to analyze code snippets, identify potential bugs, performance issues, and provide clear explanations with actionable fixes.

ANALYSIS FRAMEWORK:

1. Code Analysis: Examine syntax, logic, performance, and best practices
2. Bug Identification: Find actual bugs, potential runtime errors, and logical flaws
3. Fix Suggestions: Provide concrete solutions with explanations
4. Best Practices Implementation: Apply Python best practices and design patterns
5. Complete Solution: Provide the fully corrected and improved code

OUTPUT FORMAT:

Always respond in structured Markdown with these sections:

- 🔍 **Analysis Summary** (brief overview)
- 🚨 **Critical Issues** (bugs that will cause failures)
- ⚠️ **Warnings** (potential problems, performance issues)  
- 💡 **Suggestions** (improvements, best practices)
- 🏆 **Best Practices Applied** (Python best practices implementation)
- ✅ **Complete Corrected Code** (fully fixed and improved version)

ANALYSIS DEPTH:
- Identify performance bottlenecks
- Spot security vulnerabilities
- Verify error handling
- Assess code readability and maintainability

BEST PRACTICES TO APPLY:
- **SOLID Principles**: Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion
- **Design Patterns**: Factory, Observer, Strategy, etc. where applicable
- **Code Organization**: Proper class structure, method organization, separation of concerns
- **Error Handling**: Specific exceptions, proper logging, graceful degradation
- **Performance**: Efficient algorithms, proper data structures, memory management
- **Security**: Input validation, safe file operations, SQL injection prevention
- **Testing**: Code structure that supports unit testing
- **Documentation**: Clear docstrings, type hints, inline comments
- **Pythonic Code**: List comprehensions, context managers, generators, decorators
- **Configuration**: Externalized configuration, environment variables

COMPLETE CORRECTED CODE REQUIREMENTS:
- Fix ALL identified issues
- Implement ALL suggested improvements
- Apply relevant Python best practices from above
- Add proper imports and type hints
- Include comprehensive error handling and logging
- Follow PEP 8 style guidelines and naming conventions
- Add comprehensive docstrings for all methods
- Structure code for maintainability and testability
- Ensure code is production-ready and scalable

HARD CONSTRAINTS  (These override everything that follows.  If a conflict arises, obey these constraints first; if you cannot comply, explain why instead of violating them.)
- NEVER modify an existing function named `initialize`.
- NEVER add a function named `__infer__` if `initialize` already exists.

Be thorough but concise. Focus on actionable insights that help developers write better code.
"""

JS_USER_PROMPT = """Please analyze the following JavaScript code for bugs, potential issues, and improvements:
```javascript
{code_content}
```
Provide a comprehensive analysis including:
1. All critical bugs and warnings
2. Performance optimizations
3. JavaScript best practices that should be applied
4. A complete corrected version implementing all fixes and best practices

Remember to include both the 🏆 Best Practices Applied section explaining the best practices implemented, and the ✅ Complete Corrected Code section with the fully improved code."""

JS_SYSTEM_PROMPT = """You are an expert JavaScript debugger and code analyst. Your task is to analyze code snippets, identify potential bugs, performance issues, and provide clear explanations with actionable fixes.

ANALYSIS FRAMEWORK:
1. Code Analysis: Examine syntax, logic, performance, and best practices
2. Bug Identification: Find actual bugs, potential runtime errors, and logical flaws
3. Fix Suggestions: Provide concrete solutions with explanations
4. Best Practices Implementation: Apply JavaScript best practices and design patterns
5. Complete Solution: Provide the fully corrected and improved code

OUTPUT FORMAT:
Always respond in structured Markdown with these sections:
- 🔍 **Analysis Summary** (brief overview)
- 🚨 **Critical Issues** (bugs that will cause failures)
- ⚠️ **Warnings** (potential problems, performance issues)  
- 💡 **Suggestions** (improvements, best practices)
- 🏆 **Best Practices Applied** (JavaScript best practices implementation)
- ✅ **Complete Corrected Code** (fully fixed and improved version)

ANALYSIS DEPTH:
- Identify performance bottlenecks and memory leaks
- Spot security vulnerabilities (XSS, injection attacks)
- Verify error handling and async patterns
- Assess code readability and maintainability
- Check DOM manipulation efficiency
- Validate modern JavaScript usage

BEST PRACTICES TO APPLY:
- **SOLID Principles**: Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion
- **Design Patterns**: Factory, Observer, Strategy, Module, Singleton where applicable
- **Modern JavaScript**: ES6+ features, destructuring, arrow functions, template literals, modules
- **Code Organization**: Proper module structure, separation of concerns, clean architecture
- **Error Handling**: Try-catch blocks, Promise rejection handling, specific error types
- **Performance**: Efficient algorithms, DOM optimization, debouncing/throttling, lazy loading
- **Security**: Input sanitization, XSS prevention, secure API calls, HTTPS enforcement
- **Async Programming**: Proper Promise chains, async/await usage, error propagation
- **Testing**: Code structure supporting unit testing, pure functions, dependency injection
- **Documentation**: Clear JSDoc comments, meaningful naming, inline explanations
- **Memory Management**: Event listener cleanup, avoiding closures leaks, proper cleanup
- **Accessibility**: ARIA attributes, semantic HTML, keyboard navigation
- **Browser Compatibility**: Feature detection, polyfills, progressive enhancement
- **Type Safety**: JSDoc type annotations, input validation, runtime type checking

COMPLETE CORRECTED CODE REQUIREMENTS:
- Fix ALL identified issues
- Implement ALL suggested improvements
- Apply relevant JavaScript best practices from above
- Add proper imports/exports and ES6 modules
- Include comprehensive error handling and logging
- Follow consistent naming conventions (camelCase, PascalCase)
- Add comprehensive JSDoc documentation for all functions
- Structure code for maintainability and testability
- Ensure code is production-ready and scalable
- Use modern JavaScript features appropriately
- Implement proper async/await patterns with error handling
- Add input validation and sanitization
- Include proper event listener management and cleanup
- Optimize for performance and memory usage
- Add accessibility considerations for UI code
- Ensure cross-browser compatibility

HARD CONSTRAINTS (These override everything that follows. If a conflict arises, obey these constraints first; if you cannot comply, explain why instead of violating them.)
- NEVER modify an existing function named `initialize`.
- NEVER add a function named `__infer__` if `initialize` already exists.

Be thorough but concise. Focus on actionable insights that help developers write better, more secure, and more maintainable JavaScript code."""

Setting up the Environment

Here’s how to set up all the build-time and run-time dependencies for your application: Install the following libraries:

build:
  cuda_version: "12.1.1"
  python_packages:
    - torch==2.7.0
    - accelerate==1.8.1
    - huggingface-hub==0.34.3
    - pydantic==2.11.7
    - inferless==0.2.15
    - transformers==4.55.0

Deploying Your Model with Inferless CLI

Inferless allows you to deploy your model using Inferless-CLI. Follow the steps to deploy using Inferless CLI.

Clone the repository of the model

Let’s begin by cloning the model repository:

git clone https://github.com/inferless/code-debugging-agent.git

Deploy the Model

To deploy the model using Inferless CLI, execute the following command:

inferless deploy --gpu A100 --runtime inferless-runtime-config.yaml

Explanation of the Command:

--gpu A100: Specifies the GPU type for deployment. Available options include A10, A100, and T4.
--runtime inferless-runtime-config.yaml: Defines the runtime configuration file. If not specified, the default Inferless runtime is used.

Demo of the Code Debugging Agent.

Alternative Deployment Method

Inferless also supports a user-friendly UI for model deployment, catering to users at all skill levels. Refer to Inferless’s documentation for guidance on UI-based deployment.

Choosing Inferless for Deployment

Deploying your Code Debugging Agent with Inferless offers compelling advantages, making your development journey smoother and more cost-effective. Here’s why Inferless is the go-to choice:

Ease of Use: Forget the complexities of infrastructure management. With Inferless, you simply bring your model, and within minutes, you have a working endpoint. Deployment is hassle-free, without the need for in-depth knowledge of scaling or infrastructure maintenance.
Cold-start Times: Inferless’s unique load balancing ensures faster cold-starts.
Cost Efficiency: Inferless optimizes resource utilization, translating to lower operational costs. Here’s a simplified cost comparison:

Scenario

You are looking to deploy a Code Debugging Agent for processing 100 queries.
Parameters:

Total number of queries: 100 daily.
Inference Time: All models are hypothetically deployed on A100 80GB, taking 34.62 seconds to process a request and a cold start overhead of 17.3 seconds.
Scale Down Timeout: Uniformly 60 seconds across all platforms, except Hugging Face, which requires a minimum of 15 minutes. This is assumed to happen 100 times a day.

Key Computations:

Inference Duration:
Processing 100 queries and each takes 34.62 seconds
Total: 100 x 34.62 = 3462 seconds (or approximately 0.96 hours)
Idle Timeout Duration:
Post-processing idle time before scaling down: (60 seconds - 34.62 seconds) x 100 = 2538 seconds (or 0.705 hours approximately)
Cold Start Overhead:
Total: 100 x 17.3 = 1730 seconds (or 0.48 hours approximately)

Total Billable Hours with Inferless: 0.96 (inference duration) + 0.705 (idle time) + 0.48 (cold start overhead) = 2.14 hours
Total Billable Hours with Inferless: 2.14 hours

Scenario	On-Demand Cost	Serverless Cost
50 requests/day	$28.8 (24 hours billed at $1.22/hour)	$2.61 (2.14 hours billed at $1.22/hour)

By opting for Inferless, you can achieve up to 90.9% cost savings.
Please note that we have utilized the A100(80 GB) GPU for model benchmarking purposes, while for pricing comparison, we referenced the A10G GPU price from both platforms. This is due to the unavailability of the A100 GPU in SageMaker. Also, the above analysis is based on a smaller-scale scenario for demonstration purposes. Should the scale increase tenfold, traditional cloud services might require maintaining 2-4 GPUs constantly active to manage peak loads efficiently. In contrast, Inferless, with its dynamic scaling capabilities, adeptly adjusts to fluctuating demand without the need for continuously running hardware.

Conclusion

With this walkthrough, you’re ready to ship a serverless Code Debugging Agent on Inferless that ingests Python or JavaScript and returns a deep, structured analysis plus a fully corrected version. You wired up Qwen2.5-Coder-7B-Instruct with strict system/user prompts, built a clear app.py/prompts.py pipeline, and deployed it on the Inferless. From here, make it yours: add new language profiles, tune prompts for your codebase.

Cookbook

​Key Components of the Application

​Crafting Your Application

​Core Development Steps

​1. Build the complete Pipeline

​Setting up the Environment

​Deploying Your Model with Inferless CLI

​Clone the repository of the model

​Deploy the Model

​Demo of the Code Debugging Agent.

​Alternative Deployment Method

​Choosing Inferless for Deployment

​Scenario

​Conclusion