Twelve Hundred Contracts in a Virtual Data Room, Ten Days to Report
A corporate team was instructed on the buy-side of a mid-market acquisition of a UK facilities management company. The virtual data room contained 1,247 commercial contracts — service agreements, subcontractor arrangements, equipment leases, property licences, and insurance policies. The partner needed a due diligence report identifying change-of-control clauses, termination-for-convenience provisions, material adverse change triggers, assignment restrictions, and unusual liability caps across the entire contract portfolio. The timeline: ten working days. A team of four associates working through the contracts manually estimated 15 working days at minimum, excluding report writing.
AI-powered contract analytics can extract specified clause types, key dates, monetary thresholds, and obligation categories from thousands of contracts in hours rather than weeks. The challenge is that M&A due diligence involves the most commercially sensitive documents a target company possesses — revenue contracts, customer relationships, pricing structures. Uploading them to a cloud extraction service creates both confidentiality risk and potential breach of the data-room access undertakings. Private GPU hosting on a dedicated server within UK data centres keeps extraction entirely within the deal team’s control.
AI Architecture for Contract Data Extraction
The extraction pipeline processes contracts in three passes. First, document preparation: native Word and PDF files are parsed directly, while scanned contracts pass through PaddleOCR for text extraction (see OCR hosting guide and OCR GPU benchmarks). Second, clause extraction: a Llama 3 70B model processes each contract against a configurable extraction template specifying the clause types, terms, and provisions to identify. The model returns structured JSON with the extracted value, page reference, and confidence score for each target field.
Third, risk flagging: a secondary LLM pass reviews extracted clauses against the due diligence checklist and flags contracts with unusual terms — liability caps below market norms, non-standard termination provisions, missing assignment rights, or change-of-control clauses that could block completion. The output feeds directly into the due diligence report template.
GPU Requirements for Contract Extraction at Scale
Due diligence extraction is a batch workload with a hard deadline. Processing 1,247 contracts with a 70B model at approximately 4–8 minutes per contract requires sustained GPU utilisation over 80–160 hours.
| GPU Model | VRAM | Contracts/Hour (70B 4-bit) | 1,247 Contracts Timeline |
|---|---|---|---|
| RTX 5090 | 24 GB | ~6 (8B model: ~25) | 50 hours (8B) / not feasible at 70B |
| RTX 6000 Pro | 48 GB | ~12 | ~104 hours (5 days at 20h/day) |
| RTX 6000 Pro 96 GB | 80 GB | ~22 | ~57 hours (3 days at 20h/day) |
| RTX 6000 Pro | 80 GB | ~38 | ~33 hours (2 days at 16h/day) |
For the ten-day deadline described above, an RTX 6000 Pro completes extraction within the first week, leaving five days for associate review and report writing. Deals with larger data rooms (5,000+ contracts) should use RTX 6000 Pro or RTX 6000 Pro hardware. Consult the LLM inference benchmarks for throughput detail.
Recommended Software Stack
- OCR: PaddleOCR v4 for scanned contracts and image-based PDFs
- Extraction LLM: Llama 3 70B (AWQ 4-bit) with configurable extraction templates per clause type
- Risk Analysis: Second-pass LLM scoring with threshold-based risk flags
- Output: Structured Excel/CSV for data-room indexing, narrative summaries for due diligence report sections
- Review Interface: Web dashboard showing extracted terms alongside source PDF with highlighted passages
- Data Room Integration: Intralinks, Datasite, or Ansarada API for direct file retrieval
Confidentiality and Cost Analysis
Data-room access undertakings typically restrict how documents may be processed and by whom. A GDPR-compliant dedicated server operated by the instructed firm satisfies these undertakings — no third-party AI provider gains access to the target’s contracts. Audit logs demonstrate that data was processed on specified infrastructure under the firm’s control.
| Approach | Cost (1,247 contracts) | Turnaround |
|---|---|---|
| Manual review (4 associates, 15 days) | £48,000–£72,000 | 15 working days |
| Commercial contract analytics SaaS | £8,000–£18,000 | 3–5 days |
| GigaGPU RTX 6000 Pro + associate review | £5,000–£10,000 | 5–7 days |
The self-hosted approach delivers commercial-SaaS speed at lower cost while maintaining full data control. Healthcare data extraction teams follow similar patterns in their domain. Browse additional use cases for cross-industry extraction examples.
Getting Started
Take a recently completed due diligence exercise where you still have data-room access. Process 200 contracts through the extraction pipeline and compare AI-extracted terms against the associate’s manual extraction. Measure precision (percentage of AI extractions that are correct) and recall (percentage of manually identified terms the AI also found). Target 90%+ precision and 85%+ recall before deploying on a live matter. Fine-tune the extraction prompts based on error patterns — most firms reach production quality within two iterations. Teams that also handle large-scale document review and matter quality monitoring can share the same GPU infrastructure across all workloads.
Extract Contract Intelligence on Dedicated GPU Servers
Process thousands of contracts for due diligence with LLM-powered extraction — UK-hosted, confidential, deadline-ready.
Browse GPU Servers