Table of Contents
AWS Bedrock has matured significantly since 2023. By April 2026 it offers Claude / Anthropic models, Llama, Mistral, Cohere, plus AWS's own Titan / Nova family. Pricing is competitive; integration with AWS-native data is the killer feature. Self-hosted dedicated GPU still wins on cost at scale.
Bedrock wins for: AWS-native shops, frontier-model access (Claude, Nova), bursty workloads, no-ops requirement. Self-hosted wins for: cost-anchored at scale, residency outside AWS regions, custom fine-tunes (multi-LoRA), data not in AWS. Hybrid is common: Bedrock for spiky workloads + Claude fallback; self-hosted for steady-state bulk traffic.
Comparison
| Aspect | AWS Bedrock | Self-hosted dedicated |
|---|---|---|
| Cost at scale (50M+ tokens/mo) | Higher (per-token) | Lower (fixed) |
| Frontier model access | Yes (Claude, Nova) | No (Llama 3.3 70B max) |
| Custom fine-tuning | Limited per-model | Full (multi-LoRA, QLoRA) |
| Data residency | AWS regions | Anywhere |
| Ops burden | None | Real |
| AWS integration | Native (S3, IAM, etc.) | External |
| Burst capacity | Elastic | Capped |
| SOC 2 / HIPAA | Yes | Yes (datacenter-dependent) |
When each
- Bedrock: AWS-native shops, frontier-model access required, bursty workloads, ops capacity is constrained
- Self-hosted: >30M tokens/month sustained, custom fine-tunes, data residency outside AWS, predictable cost requirement
- Hybrid: most production teams — self-hosted bulk + Bedrock for frontier-model fallback or burst
Hybrid
Common 2026 hybrid pattern:
- Self-hosted Llama 3.1 8B / Llama 3.3 70B for ~85-95% of traffic
- Bedrock Claude 3.7 Opus for hardest 5-10% via LiteLLM router
- Bedrock Nova for AWS-native data integrations (Athena, S3 metadata)
- Self-hosted multi-LoRA for per-tenant customisation
Verdict
For 2026 production AI, Bedrock is a strong fit for AWS-native shops with bursty workloads and frontier-model needs. Self-hosted is dramatically cheaper at scale + offers customisation Bedrock can't match. Hybrid is the dominant pattern for production deployments at meaningful scale.
Bottom line
Hybrid is the production default. See Bedrock migration.