For production AI requiring regional redundancy without active-active complexity, the active-passive failover pattern is right. Primary region serves all traffic; standby region warm but idle; failover via DNS / load balancer when primary degrades. Cost: ~1.5-2× single-region; ops complexity manageable.
Primary in UK; warm standby in EU (or US, depending on residency). Vector store + model weights replicated; standby vLLM running but not serving. DNS-based failover (TTL ~60s) or load balancer reconfig (~30s). RTO < 5 minutes; RPO < 1 hour. Test quarterly with planned failover drill. Reasonable middle ground between single-region and active-active.
Pattern
- Primary region: full stack serving production traffic
- Standby region: full stack running but receiving 0% production traffic
- Data replication: vector store, configs, model weights (model weights are static; replicate once)
- Health checks on primary; fail to standby on degradation
- DNS / load balancer reconfig → standby becomes primary
Data sync
- Vector store: Qdrant snapshot replication every 1 hour to standby region
- Model weights: pull once to standby; configuration in version control
- Configs / secrets: managed via Vault / cloud secret manager with cross-region replication
- Structured logs: streamed to off-site retention (multi-region by default)
- Eval datasets: in DVC / version control; fetched to standby on demand
Failover
Failover sequence:
- Primary health check fails for 2-3 consecutive minutes
- Automated alert; on-call confirms failure isn't transient
- DNS update or LB reconfig points traffic at standby
- Standby vLLM (already warm) serves new requests
- RTO target: < 5 minutes
- Vector store: standby is 0-60 minutes behind primary; document RPO impact
- Failback to primary after issue resolved; replicate any standby-only data
Verdict
For production AI requiring regional resilience, active-passive failover is the cost-effective middle ground. ~1.5-2× single-region cost; RTO < 5 minutes; RPO < 1 hour. Test quarterly with planned drills — untested failover is theatre. For latency-sensitive global users, step up to active-active.
Bottom line
Active-passive for resilience. See multi-region.