The Challenge: Revenue Leakage Through Coding Backlogs
A private hospital group operating five sites across South East England generates roughly 6,000 discharge summaries every month. Each summary must be translated into ICD-10 diagnostic codes and OPCS-4 procedure codes before the trust can invoice commissioners or private insurers. The coding team — eight clinical coders — faces a persistent three-week backlog. Delayed coding means delayed revenue recognition. Worse, manual coding accuracy hovers around 82%, and each incorrectly coded episode triggers audit queries, resubmission cycles, and occasionally tariff penalties. The finance director estimates the combined cost of delays and errors at £400,000 per year.
The group trialled a cloud-based coding platform but abandoned the pilot when their DPO raised concerns about discharge summaries — containing full patient histories, operative notes, and consultant correspondence — being processed on servers outside UK jurisdiction.
AI Solution: LLM-Powered Clinical Code Assignment
An open-source LLM fine-tuned on clinical coding guidelines and historical coded episodes can read a discharge summary and assign ICD-10 and OPCS-4 codes with accuracy exceeding trained human coders. The model processes the narrative text — diagnoses, comorbidities, complications, procedures performed — and maps each clinical concept to its corresponding code using a retrieval-augmented approach that references the full ICD-10 code set.
The pipeline works in two stages. First, document AI extracts text from scanned or PDF discharge summaries (many NHS trusts still produce paper-based summaries that are subsequently scanned). Second, the LLM reads the extracted text and generates a structured coding output including primary diagnosis, secondary diagnoses, procedure codes, and HRG assignment. Human coders review the AI output rather than coding from scratch — a validation workflow that is dramatically faster than manual coding.
GPU Requirements: Throughput for Monthly Volumes
Medical coding inference is text-heavy: discharge summaries range from 500 to 3,000 words, and the model must generate code assignments with explanatory rationale. A typical coding pass for one summary involves 2,000-4,000 input tokens and 300-500 output tokens. The hospital group needs to process their monthly backlog efficiently while also handling daily incoming summaries.
| GPU Model | VRAM | Summaries per Hour (Mistral 7B) | 6,000 Summaries |
|---|---|---|---|
| NVIDIA RTX 5090 | 24 GB | ~120 | ~50 hours |
| NVIDIA RTX 6000 Pro | 48 GB | ~160 | ~38 hours |
| NVIDIA RTX 6000 Pro | 48 GB | ~180 | ~34 hours |
| NVIDIA RTX 6000 Pro 96 GB | 80 GB | ~280 | ~22 hours |
An RTX 6000 Pro through GigaGPU clears the monthly volume in under two days of continuous processing. In practice, daily batches of 200-300 summaries complete in one to two hours, meaning coded outputs are available the same business day they arrive.
Recommended Stack
- PaddleOCR or Tesseract for extracting text from scanned discharge summaries — PaddleOCR on GPU handles the variable quality of NHS document scanning with higher accuracy.
- Mistral 7B-Instruct or LLaMA 3 8B fine-tuned on ICD-10 coding guidelines, NHS clinical coding standards, and 50,000+ previously coded episodes.
- vLLM for batch inference serving with optimised throughput on large document queues.
- FAISS vector index containing the complete ICD-10 5th Edition code set (approximately 16,000 codes) for retrieval-augmented code lookup.
- FastAPI with integration to the hospital PAS (Patient Administration System) for automated discharge summary ingestion and coded output return.
An AI chatbot interface allows clinical coders to query the system interactively — asking why a particular code was assigned or requesting alternative code suggestions when the automated output needs refinement.
Cost vs. Alternatives
Outsourcing clinical coding to a third-party bureau costs £8-£15 per episode. At 6,000 summaries monthly, that totals £48,000-£90,000 per year — and still does not solve the turnaround time problem, as bureau coders typically deliver results within 5-10 working days. An AI-first approach on dedicated GPU hardware reduces the cost per coded episode to pennies while delivering same-day turnaround.
The accuracy argument is equally compelling. Fine-tuned LLMs consistently achieve 88-93% primary code accuracy in benchmarks against expert coders, and they never have off days, never rush before a bank holiday weekend, and maintain consistent quality at 11 PM on a Friday.
Getting Started
Gather 5,000 historically coded discharge summaries with verified ICD-10 assignments. Use 4,000 for fine-tuning and 1,000 for validation. Deploy the fine-tuned model on GigaGPU infrastructure and run parallel coding alongside the human team for one month, measuring concordance rates and identifying systematic disagreements that may indicate either AI errors or historical human coding inconsistencies.
GigaGPU provides private AI hosting with GDPR-compliant infrastructure purpose-built for healthcare workloads. Every discharge summary stays on UK soil throughout the coding pipeline.
GigaGPU’s UK-based servers process discharge summaries in minutes, not weeks. Full data sovereignty, no per-episode fees.
View GPU Hosting Plans