Table of Contents
GPU Selection for PaddleOCR
PaddleOCR is PaddlePaddle’s open-source OCR toolkit supporting text detection, recognition, and layout analysis in 80+ languages. It is remarkably lightweight, making even budget GPUs viable for high-throughput PaddleOCR hosting on a dedicated GPU server:
| Pipeline | VRAM Usage | Recommended GPU | Pages per Minute |
|---|---|---|---|
| PP-OCRv4 (detect + recognise) | ~0.8 GB | RTX 3050 | ~120 |
| PP-OCRv4 + layout analysis | ~1.2 GB | RTX 4060 | ~90 |
| PP-OCRv4 + table recognition | ~1.8 GB | RTX 4060 | ~60 |
| PP-Structure (full pipeline) | ~2.5 GB | RTX 4060 | ~40 |
PaddleOCR uses under 1 GB for standard text recognition, meaning you can easily co-host it alongside an LLM like Phi-3 or Qwen 2.5 on the same GPU for document understanding pipelines.
Install PaddleOCR
# Install PaddlePaddle GPU and PaddleOCR
pip install paddlepaddle-gpu paddleocr
# Basic OCR usage
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang="en", use_gpu=True)
result = ocr.ocr("document.png", cls=True)
for line in result[0]:
coords, (text, confidence) = line
print(f"[{confidence:.2f}] {text}")
PaddleOCR auto-downloads model weights on first run. It supports English, Chinese, Japanese, Korean, and 80+ other languages out of the box.
Building an OCR API
# Install FastAPI
pip install fastapi uvicorn python-multipart
# api.py
from fastapi import FastAPI, UploadFile
from paddleocr import PaddleOCR
import tempfile, os
app = FastAPI()
ocr = PaddleOCR(use_angle_cls=True, lang="en", use_gpu=True)
@app.post("/ocr")
async def extract_text(file: UploadFile):
with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
tmp.write(await file.read())
tmp_path = tmp.name
result = ocr.ocr(tmp_path, cls=True)
lines = []
for line in result[0]:
coords, (text, confidence) = line
lines.append({"text": text, "confidence": float(confidence)})
os.unlink(tmp_path)
return {"lines": lines}
# Run: uvicorn api:app --host 0.0.0.0 --port 8000
Read the self-host guide for server setup fundamentals. Check the OCR speed benchmarks for cross-GPU performance data.
Performance Benchmarks
Tested with A4 scanned documents at 300 DPI (2480×3508 pixels).
| GPU | Pipeline | Time per Page | Pages per Minute | VRAM |
|---|---|---|---|---|
| RTX 3050 | PP-OCRv4 | 0.5s | ~120 | 0.8 GB |
| RTX 4060 | PP-OCRv4 | 0.3s | ~200 | 0.8 GB |
| RTX 4060 | PP-Structure | 1.5s | ~40 | 2.5 GB |
| RTX 3090 | PP-OCRv4 | 0.2s | ~300 | 0.8 GB |
| RTX 3090 | PP-Structure | 0.9s | ~67 | 2.5 GB |
The RTX 4060 processes 200 pages per minute with the standard pipeline, making it ideal for high-volume document digitisation. The RTX 3090 pushes this to 300 pages per minute for enterprise-scale workloads.
Optimisation Tips
- Batch multiple pages to keep the GPU fully utilised during sequential document processing.
- Use TensorRT acceleration for a 2-3x throughput improvement over the default PaddlePaddle backend.
- Pre-process images to consistent DPI and orientation before OCR to improve accuracy and speed.
- Use PP-OCRv4 for speed and PP-Structure only when you need table extraction or layout analysis.
- Co-host with an LLM to build intelligent document processing pipelines that extract and summarise text in a single pass.
Compare GPU options with the GPU comparisons tool. For cost planning, use the cheapest GPU for AI inference guide.
Next Steps
PaddleOCR is one of the most efficient AI workloads to self-host. Pair it with Whisper for multi-modal document and audio processing. For text analysis after extraction, see our LLaMA hosting options. Browse all deployment guides in the model guides section.
Deploy PaddleOCR Now
Run high-speed OCR on a dedicated GPU server. Process hundreds of pages per minute with full root access and no API limits.
Browse GPU Servers