RTX 3050 - Order Now
Home / Blog / Tutorials / smolagents Self-Hosted
Tutorials

smolagents Self-Hosted

Hugging Face's smolagents is a minimalist agent framework - code-first, under 1000 lines total. Ideal for lightweight self-hosted agents.

smolagents is Hugging Face’s minimalist agent framework – a few hundred lines of code that produces a surprisingly capable code-writing agent. Its pitch: agents should write code to use tools rather than calling JSON-formatted functions. On our dedicated GPU hosting it pairs well with a self-hosted coding model.

Contents

Philosophy

Traditional agent frameworks ask the LLM to emit JSON tool calls. smolagents has the LLM emit Python code that calls tools as functions. The argument: LLMs are better at code than at perfectly-formatted tool JSON, and code enables composition and control flow naturally.

Setup

pip install smolagents

from smolagents import CodeAgent, LiteLLMModel

model = LiteLLMModel(
    model_id="openai/qwen-coder-32b",
    api_base="http://localhost:8000/v1",
    api_key="not-needed",
)

Example

from smolagents import tool, CodeAgent

@tool
def search(query: str) -> str:
    """Search the web for information."""
    return perform_search(query)

@tool
def read_file(path: str) -> str:
    """Read a file from the local filesystem."""
    with open(path) as f:
        return f.read()

agent = CodeAgent(tools=[search, read_file], model=model)
result = agent.run("Find the latest GPU prices and save the summary to summary.txt")

The agent writes Python code calling search and read_file directly rather than producing JSON.

Sandboxing

Code agents execute LLM-generated Python. Running this in your production environment is risky. smolagents supports sandboxed execution via E2B or local subprocess isolation. Always use one of these – never run agent-generated code directly in your main process.

Code-First Agent Hosting

smolagents + Qwen Coder on UK dedicated GPUs with sandboxing enabled.

Browse GPU Servers

See Qwen Coder 32B and Open Interpreter.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?