Table of Contents
LiteLLM is the open-source router that abstracts your application from specific LLM providers. Point your code at LiteLLM; LiteLLM handles routing to self-hosted vLLM, hosted APIs, fallback, retry, rate limiting. The right primitive for hybrid AI architectures.
LiteLLM exposes a single OpenAI-compatible endpoint. Your app calls it. LiteLLM routes by model name to: self-hosted vLLM (default), Anthropic Claude (fallback), OpenAI (escalation). Handles retry, rate limiting, cost tracking, fallback rules. ~5 minutes to set up; transformative for hybrid AI architecture.
Why a router
- Single API surface: app code knows one endpoint, not multiple SDKs
- Centralised fallback logic: routing rules in one config, not scattered through codebase
- Built-in retry: handles transient failures + rate limits
- Cost tracking: aggregate cost across providers
- Easy migration: swap providers by changing config, not code
Config
# litellm_config.yaml
model_list:
- model_name: production-llm
litellm_params:
model: openai/Meta-Llama-3.1-8B-Instruct
api_base: http://your-vllm:8000/v1
api_key: dummy
- model_name: production-llm-fallback
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
router_settings:
fallbacks:
- production-llm: [production-llm-fallback]
context_window_fallbacks:
- production-llm: [production-llm-fallback]
num_retries: 2
timeout: 30
litellm_settings:
drop_params: true
set_verbose: false
Run with litellm --config litellm_config.yaml --port 4000. Your app talks to LiteLLM on port 4000.
Patterns
- Self-hosted primary + hosted fallback: LiteLLM tries self-hosted; on failure or context-window-exceeded, routes to Claude / GPT-4o
- Confidence-based routing: route to frontier API when self-hosted output confidence is low (custom header)
- Per-tenant routing: free tier → self-hosted; premium → hosted; route by API key
- A/B testing: route 10% to model B for evaluation
- Rate-limit handling: when one provider rate-limits, automatic failover to next
Verdict
For any production AI architecture beyond single-provider, LiteLLM is the right routing primitive. Open source, OpenAI-compatible, well-maintained, fast. ~5 minutes setup; pays back the first time you need to swap providers or add fallback.
Bottom line
LiteLLM = the router for hybrid AI. See hybrid decision.