For organisations processing documents at scale — invoices, contracts, application forms, regulatory submissions — a self-hosted pipeline of OCR + structure extraction + LLM analysis + structured output is dramatically cheaper than hosted alternatives. The architecture is well-defined in 2026.
Stack: PaddleOCR PP-Structure (OCR + layout) + Mistral 7B / Llama 3.1 8B (structured extraction) + Pydantic schema validation, all on a 5060 Ti / 4090. ~40 pages/sec end-to-end on 4090. ~£0.0001 per page vs hosted ~£0.01-0.04. UK / EU residency for compliance.
Stack
- Object storage: incoming docs land in S3-compatible bucket
- Worker queue: Redis / RabbitMQ pulls jobs
- OCR: PaddleOCR PP-Structure for OCR + layout (tables, paragraphs, headings)
- LLM extraction: vLLM + Mistral 7B + Pydantic schema
- Validation: Pydantic validators + business-logic checks
- Output: structured JSON to your database / downstream system
- Audit log: structured JSON of every doc + extraction
Workflow
- Doc uploaded to S3; trigger fires
- Worker pulls; renders PDF to images if needed
- PaddleOCR: detection + recognition + layout analysis
- Combined OCR text + layout passed to LLM with extraction schema
- LLM with vLLM guided decoding produces structured JSON
- Pydantic validates + adds business-logic checks
- Result stored in database; original kept in S3 with audit reference
- Failure path: queue for human review with both OCR text + LLM output
Scale
- 5060 Ti at 80% utilisation: ~25 pages/sec sustained → ~65M pages/month
- 4090 at 80% util: ~40 pages/sec → ~100M pages/month
- 5090 at 80% util: ~60 pages/sec → ~150M pages/month
- Cost: ~£0.0001 per page vs AWS Textract + Bedrock ~£0.01-0.04 per page — ~100-400× saving
Verdict
For document processing at production scale, self-hosted PaddleOCR + Mistral 7B + Pydantic is the right architecture in 2026. ~40 pages/sec on a £289/mo 4090; cost economics decisive vs hosted alternatives at any meaningful volume.
Bottom line
4090 + PaddleOCR + Mistral = doc processing workhorse. See 5090 benchmark.