smolagents is Hugging Face’s minimalist agent framework – a few hundred lines of code that produces a surprisingly capable code-writing agent. Its pitch: agents should write code to use tools rather than calling JSON-formatted functions. On our dedicated GPU hosting it pairs well with a self-hosted coding model.
Contents
Philosophy
Traditional agent frameworks ask the LLM to emit JSON tool calls. smolagents has the LLM emit Python code that calls tools as functions. The argument: LLMs are better at code than at perfectly-formatted tool JSON, and code enables composition and control flow naturally.
Setup
pip install smolagents
from smolagents import CodeAgent, LiteLLMModel
model = LiteLLMModel(
model_id="openai/qwen-coder-32b",
api_base="http://localhost:8000/v1",
api_key="not-needed",
)
Example
from smolagents import tool, CodeAgent
@tool
def search(query: str) -> str:
"""Search the web for information."""
return perform_search(query)
@tool
def read_file(path: str) -> str:
"""Read a file from the local filesystem."""
with open(path) as f:
return f.read()
agent = CodeAgent(tools=[search, read_file], model=model)
result = agent.run("Find the latest GPU prices and save the summary to summary.txt")
The agent writes Python code calling search and read_file directly rather than producing JSON.
Sandboxing
Code agents execute LLM-generated Python. Running this in your production environment is risky. smolagents supports sandboxed execution via E2B or local subprocess isolation. Always use one of these – never run agent-generated code directly in your main process.
Code-First Agent Hosting
smolagents + Qwen Coder on UK dedicated GPUs with sandboxing enabled.
Browse GPU ServersSee Qwen Coder 32B and Open Interpreter.