Home / Blog / Tutorials / AutoGen Self-Hosted LLM Agent

Tutorials

AutoGen Self-Hosted LLM Agent

Microsoft's AutoGen orchestrates multi-agent workflows. Pointed at a self-hosted LLM it delivers production agent pipelines without per-token fees.

Tutorials April 23, 2026 1 min read admin

AutoGen is Microsoft’s framework for building multi-agent systems – a user agent, an executor agent, a critic agent coordinating to solve tasks. By default it uses OpenAI API but it works equally well with any OpenAI-compatible endpoint. On our dedicated GPU hosting you can run full agent workflows on your own LLM with no per-token cost.

Setup
Config for local LLM
Example workflow
Model recommendations

Setup

pip install autogen-agentchat autogen-ext openai

Run vLLM locally with your chosen model. See self-hosted OpenAI-compatible API.

Config

from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="llama-3.3-70b",
    base_url="http://localhost:8000/v1",
    api_key="not-needed",
    model_info={
        "vision": False,
        "function_calling": True,
        "json_output": True,
        "family": "unknown",
    },
)

The model_info dict tells AutoGen what the model supports. Function calling and JSON output need the underlying LLM to actually handle them – Llama 3.3 and Qwen 2.5 both do well.

Example

from autogen_agentchat.agents import AssistantAgent, CodeExecutorAgent
from autogen_agentchat.teams import RoundRobinGroupChat

assistant = AssistantAgent("assistant", model_client=model_client)
executor = CodeExecutorAgent("executor", code_executor=LocalCommandLineCodeExecutor())

team = RoundRobinGroupChat([assistant, executor], termination_condition=...)
await team.run(task="Analyse sales.csv and produce a summary")

Models

Agent Type	Recommended Self-Hosted Model
General assistant	Llama 3.3 70B
Code executor agent	Qwen Coder 32B
Reasoning agent	R1 Distill 32B
Low-latency router	Llama 3 8B

Self-Hosted Multi-Agent Hosting

AutoGen on UK dedicated GPUs with a strong LLM behind it.

Browse GPU Servers

See CrewAI and LangGraph.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

AutoGen Self-Hosted LLM Agent

Contents

Setup

Config

Example

Models

Self-Hosted Multi-Agent Hosting

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

AutoGen Self-Hosted LLM Agent

Contents

Setup

Config

Example

Models

Self-Hosted Multi-Agent Hosting

Need a Dedicated GPU Server?

admin

Related Articles

Tailscale for a Private AI Network

browser-use Self-Hosted Agent

LoRA vs QLoRA vs Full Fine-Tuning: GPU Requirements

RTX 5060 Ti 16GB with Chunked Prefill

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?