Blue-green deployment keeps two full copies of your LLM API running. Blue is live; green is the new version being validated. The load balancer switches traffic atomically. On our dedicated GPU hosting it needs double the GPU capacity but gives you the strongest rollback story.
Contents
Why Blue-Green
Rolling upgrades (replacing instances one at a time) can leave mixed versions serving traffic during cutover. Blue-green keeps both versions completely separate. You validate green in full production-like conditions (shadow traffic, synthetic tests) before flipping a single switch.
Topology
- Blue environment: 2-4 vLLM replicas on one GPU pool, live traffic
- Green environment: 2-4 vLLM replicas on a second GPU pool, new model version
- Load balancer: nginx or HAProxy with two upstream pools
In a multi-server setup, blue runs on one box and green on another. On a large multi-GPU chassis you can split GPUs between the two environments.
Promoting
# Currently routing to blue pool
upstream llm { server blue01:8000; server blue02:8000; }
# After verifying green
upstream llm { server green01:8000; server green02:8000; }
nginx -s reload
Single reload, atomic from clients’ perspective. Leave blue running for 1-24 hours after cutover to enable rapid revert if green reveals problems.
Cost
You pay for double the GPU capacity during the overlap period. Options to reduce:
- Keep blue for only the cutover window (1-24 hours), then free those GPUs
- Use a single chassis with GPUs split between blue and green – cheaper than two separate servers
- Smaller blue during the validation window (sized for your baseline traffic, not peak)
Blue-Green Ready GPU Hosting
Multi-server UK dedicated hosting for parallel environments with fixed monthly pricing.
Browse GPU ServersSee zero-downtime model swap and canary rollout.