Home / Blog / Tutorials / smolagents Self-Hosted

Tutorials

smolagents Self-Hosted

Hugging Face's smolagents is a minimalist agent framework - code-first, under 1000 lines total. Ideal for lightweight self-hosted agents.

Tutorials April 23, 2026 2 min read admin

smolagents is Hugging Face’s minimalist agent framework – a few hundred lines of code that produces a surprisingly capable code-writing agent. Its pitch: agents should write code to use tools rather than calling JSON-formatted functions. On our dedicated GPU hosting it pairs well with a self-hosted coding model.

Code-as-action philosophy
Setup
Example
Sandboxing

Philosophy

Traditional agent frameworks ask the LLM to emit JSON tool calls. smolagents has the LLM emit Python code that calls tools as functions. The argument: LLMs are better at code than at perfectly-formatted tool JSON, and code enables composition and control flow naturally.

Setup

pip install smolagents

from smolagents import CodeAgent, LiteLLMModel

model = LiteLLMModel(
    model_id="openai/qwen-coder-32b",
    api_base="http://localhost:8000/v1",
    api_key="not-needed",
)

Example

from smolagents import tool, CodeAgent

@tool
def search(query: str) -> str:
    """Search the web for information."""
    return perform_search(query)

@tool
def read_file(path: str) -> str:
    """Read a file from the local filesystem."""
    with open(path) as f:
        return f.read()

agent = CodeAgent(tools=[search, read_file], model=model)
result = agent.run("Find the latest GPU prices and save the summary to summary.txt")

The agent writes Python code calling search and read_file directly rather than producing JSON.

Sandboxing

Code agents execute LLM-generated Python. Running this in your production environment is risky. smolagents supports sandboxed execution via E2B or local subprocess isolation. Always use one of these – never run agent-generated code directly in your main process.

Code-First Agent Hosting

smolagents + Qwen Coder on UK dedicated GPUs with sandboxing enabled.

Browse GPU Servers

See Qwen Coder 32B and Open Interpreter.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

smolagents Self-Hosted

Contents

Philosophy

Setup

Example

Sandboxing

Code-First Agent Hosting

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

smolagents Self-Hosted

Contents

Philosophy

Setup

Example

Sandboxing

Code-First Agent Hosting

Need a Dedicated GPU Server?

admin

Related Articles

Migrate from RunPod to Dedicated GPU: Model Training

vLLM Speculative Decoding Setup – Faster Tokens, Same Model

RTX 5060 Ti 16GB Sanity Test Script

OCR + LLM Document Summarisation Pipeline

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?