RTX 3050 - Order Now
Home / Blog / Use Cases / Build OCR API with PaddleOCR on GPU
Use Cases

Build OCR API with PaddleOCR on GPU

Build a production OCR API with PaddleOCR on a dedicated GPU server. Extract text from documents, receipts, IDs, and handwritten notes with layout-aware recognition and table detection — no per-page fees or sensitive documents leaving your infrastructure.

What You’ll Build

In 30 minutes, you will have a production OCR API that accepts images, scanned documents, and PDFs, returning extracted text with bounding boxes, confidence scores, layout structure, and table data. Running PaddleOCR on a dedicated GPU server, your API processes 100+ pages per second with support for 80+ languages, handwriting recognition, and structured table extraction — all at zero per-page cost.

Cloud OCR services charge $1.50-$10 per 1,000 pages depending on features. For document-heavy workflows processing invoices, contracts, medical records, or identity documents, costs add up quickly while sensitive data routes through third-party servers. Self-hosted PaddleOCR on GPU delivers faster throughput with complete data sovereignty.

Architecture Overview

PaddleOCR runs a three-stage pipeline: text detection (finding text regions), text recognition (reading characters), and layout analysis (understanding document structure). The API wraps this pipeline behind FastAPI with endpoints for single-page OCR, multi-page PDF processing, and table extraction. Each stage runs on GPU for maximum throughput.

The API layer accepts images (JPEG, PNG, TIFF), PDFs, and base64-encoded payloads. Output includes raw text, word-level bounding boxes with confidence scores, detected tables as structured data, and reading order based on layout analysis. Pair with a language model for intelligent document understanding — extracting structured data from the OCR output using open-source LLMs.

GPU Requirements

WorkloadRecommended GPUVRAMThroughput
Standard OCRRTX 509024 GB~120 pages/sec
OCR + layout + tablesRTX 509024 GB~60 pages/sec
OCR + LLM extractionRTX 6000 Pro40 GB~30 pages/sec

PaddleOCR models are lightweight — the full pipeline uses under 2GB VRAM, leaving room to run alongside an LLM for document understanding. For maximum throughput, batch multiple pages through the detection and recognition stages together. See our self-hosted model guide for OCR-plus-LLM deployment patterns.

Step-by-Step Build

Deploy PaddleOCR on your GPU server and build the API with document parsing, table extraction, and batch processing endpoints.

from fastapi import FastAPI, UploadFile
from paddleocr import PaddleOCR
from PIL import Image
import io, numpy as np

app = FastAPI()
ocr = PaddleOCR(use_angle_cls=True, lang="en", use_gpu=True,
                enable_mkldnn=False, det_db_score_mode="slow")

@app.post("/v1/ocr")
async def extract_text(file: UploadFile, language: str = "en",
                       detect_tables: bool = False):
    image = Image.open(io.BytesIO(await file.read()))
    img_array = np.array(image)

    result = ocr.ocr(img_array, cls=True)

    words = []
    full_text = []
    for line in result[0]:
        bbox, (text, confidence) = line
        words.append({
            "text": text,
            "confidence": float(confidence),
            "bbox": {"points": bbox}
        })
        full_text.append(text)

    response = {
        "text": "\n".join(full_text),
        "words": words,
        "page_count": 1
    }

    if detect_tables:
        response["tables"] = extract_tables(result)

    return response

@app.post("/v1/ocr/pdf")
async def extract_pdf(file: UploadFile, language: str = "en"):
    # Convert PDF pages to images and OCR each
    pages = pdf_to_images(await file.read())
    results = [ocr.ocr(np.array(page), cls=True) for page in pages]
    return {"pages": [format_page(r) for r in results],
            "page_count": len(pages)}

Add table detection that identifies row and column structures and returns tables as arrays. For structured document extraction, chain OCR output into an LLM prompt that converts raw text into structured JSON. The OpenAI-compatible format works well for the LLM post-processing stage. See production setup for scaling concurrent OCR requests.

Accuracy and Language Support

PaddleOCR supports 80+ languages out of the box with specialised models for Chinese, Japanese, Korean, Arabic, and Devanagari scripts. For domain-specific documents with unusual fonts or layouts, fine-tune the recognition model on your document samples. Typical accuracy exceeds 95% for printed text and 85% for handwritten content on clean scans.

Pre-processing improves accuracy on poor-quality scans: auto-rotation corrects skewed documents, contrast enhancement recovers faded text, and noise reduction cleans photographed documents. Build these as optional pipeline stages triggered by quality scoring on the input image.

Deploy Your OCR API

A self-hosted PaddleOCR API delivers high-speed document digitisation with complete control over your data pipeline. Power invoice processing, document management, compliance workflows, or archival digitisation without per-page fees. Launch on GigaGPU dedicated GPU hosting and start extracting text at scale. Browse more API use cases and tutorials in our library.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?