What You’ll Build
In 30 minutes, you will have a production OCR API that accepts images, scanned documents, and PDFs, returning extracted text with bounding boxes, confidence scores, layout structure, and table data. Running PaddleOCR on a dedicated GPU server, your API processes 100+ pages per second with support for 80+ languages, handwriting recognition, and structured table extraction — all at zero per-page cost.
Cloud OCR services charge $1.50-$10 per 1,000 pages depending on features. For document-heavy workflows processing invoices, contracts, medical records, or identity documents, costs add up quickly while sensitive data routes through third-party servers. Self-hosted PaddleOCR on GPU delivers faster throughput with complete data sovereignty.
Architecture Overview
PaddleOCR runs a three-stage pipeline: text detection (finding text regions), text recognition (reading characters), and layout analysis (understanding document structure). The API wraps this pipeline behind FastAPI with endpoints for single-page OCR, multi-page PDF processing, and table extraction. Each stage runs on GPU for maximum throughput.
The API layer accepts images (JPEG, PNG, TIFF), PDFs, and base64-encoded payloads. Output includes raw text, word-level bounding boxes with confidence scores, detected tables as structured data, and reading order based on layout analysis. Pair with a language model for intelligent document understanding — extracting structured data from the OCR output using open-source LLMs.
GPU Requirements
| Workload | Recommended GPU | VRAM | Throughput |
|---|---|---|---|
| Standard OCR | RTX 5090 | 24 GB | ~120 pages/sec |
| OCR + layout + tables | RTX 5090 | 24 GB | ~60 pages/sec |
| OCR + LLM extraction | RTX 6000 Pro | 40 GB | ~30 pages/sec |
PaddleOCR models are lightweight — the full pipeline uses under 2GB VRAM, leaving room to run alongside an LLM for document understanding. For maximum throughput, batch multiple pages through the detection and recognition stages together. See our self-hosted model guide for OCR-plus-LLM deployment patterns.
Step-by-Step Build
Deploy PaddleOCR on your GPU server and build the API with document parsing, table extraction, and batch processing endpoints.
from fastapi import FastAPI, UploadFile
from paddleocr import PaddleOCR
from PIL import Image
import io, numpy as np
app = FastAPI()
ocr = PaddleOCR(use_angle_cls=True, lang="en", use_gpu=True,
enable_mkldnn=False, det_db_score_mode="slow")
@app.post("/v1/ocr")
async def extract_text(file: UploadFile, language: str = "en",
detect_tables: bool = False):
image = Image.open(io.BytesIO(await file.read()))
img_array = np.array(image)
result = ocr.ocr(img_array, cls=True)
words = []
full_text = []
for line in result[0]:
bbox, (text, confidence) = line
words.append({
"text": text,
"confidence": float(confidence),
"bbox": {"points": bbox}
})
full_text.append(text)
response = {
"text": "\n".join(full_text),
"words": words,
"page_count": 1
}
if detect_tables:
response["tables"] = extract_tables(result)
return response
@app.post("/v1/ocr/pdf")
async def extract_pdf(file: UploadFile, language: str = "en"):
# Convert PDF pages to images and OCR each
pages = pdf_to_images(await file.read())
results = [ocr.ocr(np.array(page), cls=True) for page in pages]
return {"pages": [format_page(r) for r in results],
"page_count": len(pages)}
Add table detection that identifies row and column structures and returns tables as arrays. For structured document extraction, chain OCR output into an LLM prompt that converts raw text into structured JSON. The OpenAI-compatible format works well for the LLM post-processing stage. See production setup for scaling concurrent OCR requests.
Accuracy and Language Support
PaddleOCR supports 80+ languages out of the box with specialised models for Chinese, Japanese, Korean, Arabic, and Devanagari scripts. For domain-specific documents with unusual fonts or layouts, fine-tune the recognition model on your document samples. Typical accuracy exceeds 95% for printed text and 85% for handwritten content on clean scans.
Pre-processing improves accuracy on poor-quality scans: auto-rotation corrects skewed documents, contrast enhancement recovers faded text, and noise reduction cleans photographed documents. Build these as optional pipeline stages triggered by quality scoring on the input image.
Deploy Your OCR API
A self-hosted PaddleOCR API delivers high-speed document digitisation with complete control over your data pipeline. Power invoice processing, document management, compliance workflows, or archival digitisation without per-page fees. Launch on GigaGPU dedicated GPU hosting and start extracting text at scale. Browse more API use cases and tutorials in our library.