What You’ll Build
In about 90 minutes, you will have a resume screening pipeline that ingests PDF and DOCX applications, extracts structured candidate profiles, scores them against job requirements, and ranks the top matches. Processing 500 resumes takes under four minutes on a single GPU. The entire system runs on your own dedicated GPU server, so sensitive candidate data never leaves your infrastructure.
Manual resume screening is one of the most time-consuming bottlenecks in recruitment. HR teams spend an average of seven seconds per resume in initial passes, leading to missed talent and inconsistent evaluation. A GPU-powered screening system applies the same criteria to every application, scales to any volume, and integrates with your existing ATS through a simple REST API. Built on open-source LLM hosting, the system costs a fraction of commercial screening platforms.
Architecture Overview
The pipeline has four stages: document parsing, information extraction, semantic matching, and scoring. Incoming resumes flow through a document processor that uses OCR and document AI for scanned PDFs, then a structured extraction model pulls out skills, experience, education, and contact details. A RAG-powered matching engine compares extracted profiles against job descriptions using embedding similarity combined with LLM-based reasoning.
The scoring module uses a fine-tuned LLM served through vLLM to evaluate each candidate on configurable criteria. Output includes a numerical score, a short justification paragraph, and flagged strengths or gaps. LangChain orchestrates the multi-step pipeline, handling retries and structured output parsing. Results feed into a review dashboard or push directly to your ATS via webhooks.
GPU Requirements
| Volume | Recommended GPU | VRAM | Processing Speed |
|---|---|---|---|
| Up to 200 resumes/day | RTX 5090 | 24 GB | ~2 resumes/sec |
| 200 – 2,000 resumes/day | RTX 6000 Pro | 40 GB | ~5 resumes/sec |
| 2,000+ resumes/day | RTX 6000 Pro 96 GB | 80 GB | ~8 resumes/sec |
The extraction and scoring steps both require GPU inference, so the model stays loaded in VRAM throughout processing. An 8B-parameter instruction-tuned model handles extraction accurately, while a 70B model significantly improves nuanced scoring for senior roles. See our self-hosted LLM guide for model sizing recommendations.
Step-by-Step Build
Provision your GPU server and install the document processing stack. Use PyMuPDF for native PDFs and PaddleOCR for scanned documents. Deploy vLLM with your chosen model and configure the extraction prompt template.
# Extraction prompt
EXTRACT_PROMPT = """Extract structured data from this resume text.
Return JSON with fields: name, email, phone, skills (array),
experience (array of {company, role, duration, description}),
education (array of {institution, degree, year}).
Resume text:
{resume_text}"""
# Scoring prompt
SCORE_PROMPT = """Score this candidate for the following role.
Job requirements: {job_description}
Candidate profile: {extracted_profile}
Return JSON: {score: 1-100, justification: string,
strengths: array, gaps: array}"""
Wire the pipeline together with a task queue. Each resume enters the queue, gets parsed, extracted, scored, and written to your database. The dashboard reads from the database and presents ranked candidates. Follow our vLLM production setup guide for optimal inference configuration.
Performance and Accuracy
On an RTX 6000 Pro 96 GB running Llama 3 8B, the full pipeline processes a single resume in approximately 1.2 seconds including parsing, extraction, and scoring. Batch processing 1,000 resumes completes in around 20 minutes. Extraction accuracy for structured fields exceeds 94% on standard resume formats. Scoring correlation with human recruiter rankings reaches 0.82 after prompt tuning with a small evaluation set.
The system handles edge cases like multi-page resumes, creative formatting, and international CVs by normalising extracted text before LLM processing. For high-volume recruiting seasons, the pipeline scales horizontally by adding GPU nodes behind a load balancer with AI hosting infrastructure.
Cost and Privacy Advantages
Commercial resume screening platforms charge $0.50-2.00 per resume. At 5,000 applications per month, that adds up to $2,500-10,000. A dedicated GPU server processes unlimited resumes for a fixed monthly cost while keeping all candidate PII on your own infrastructure, satisfying GDPR and internal compliance requirements. Deploy your screening system on GigaGPU dedicated GPU hosting and start processing applications today. Explore additional use case guides for more AI build patterns.