AutoGen is Microsoft’s framework for building multi-agent systems – a user agent, an executor agent, a critic agent coordinating to solve tasks. By default it uses OpenAI API but it works equally well with any OpenAI-compatible endpoint. On our dedicated GPU hosting you can run full agent workflows on your own LLM with no per-token cost.
Contents
Setup
pip install autogen-agentchat autogen-ext openai
Run vLLM locally with your chosen model. See self-hosted OpenAI-compatible API.
Config
from autogen_ext.models.openai import OpenAIChatCompletionClient
model_client = OpenAIChatCompletionClient(
model="llama-3.3-70b",
base_url="http://localhost:8000/v1",
api_key="not-needed",
model_info={
"vision": False,
"function_calling": True,
"json_output": True,
"family": "unknown",
},
)
The model_info dict tells AutoGen what the model supports. Function calling and JSON output need the underlying LLM to actually handle them – Llama 3.3 and Qwen 2.5 both do well.
Example
from autogen_agentchat.agents import AssistantAgent, CodeExecutorAgent
from autogen_agentchat.teams import RoundRobinGroupChat
assistant = AssistantAgent("assistant", model_client=model_client)
executor = CodeExecutorAgent("executor", code_executor=LocalCommandLineCodeExecutor())
team = RoundRobinGroupChat([assistant, executor], termination_condition=...)
await team.run(task="Analyse sales.csv and produce a summary")
Models
| Agent Type | Recommended Self-Hosted Model |
|---|---|
| General assistant | Llama 3.3 70B |
| Code executor agent | Qwen Coder 32B |
| Reasoning agent | R1 Distill 32B |
| Low-latency router | Llama 3 8B |
Self-Hosted Multi-Agent Hosting
AutoGen on UK dedicated GPUs with a strong LLM behind it.
Browse GPU Servers