Rule-based lead scoring (points for job title, company size, UTM source) misses the texture buried in form free-text fields and CRM activity notes. An LLM-based scorer on the RTX 5060 Ti 16GB at our UK dedicated GPU hosting reads the full context of each lead, returns structured JSON and processes thousands of leads per hour, all without shipping customer data to a third-party API. Blackwell 4608 CUDA, 16 GB GDDR7 and native FP8 give 122 t/s on Mistral 7B FP8 and ~720 t/s aggregate.
Contents
Inputs
- Form submission fields (name, email, phone, free-text message)
- Firmographic enrichment (industry, employee count, revenue, HQ country)
- Persona signal (job title, seniority)
- Behavioural signal (pages visited, content downloaded, email opens)
- Source and campaign attribution
- Sales rep notes from prior touches
Prompt and structured output
system = """You are a lead qualification analyst. Output JSON only."""
user = f"""Score this lead 1-10 against our ICP.
ICP: B2B SaaS, 50-500 employees, UK or EU headquartered,
decision-maker or influencer, budget 5k+.
Lead: {json.dumps(lead)}
Return JSON with schema:
{{"score": int, "tier": "A"|"B"|"C"|"D",
"icp_match": float,
"intent": "high"|"medium"|"low",
"reasoning": "one sentence",
"next_action": "book_call"|"nurture"|"disqualify"}}"""
Use vLLM’s guided_json parameter with the schema above to guarantee parseable output. Mistral 7B FP8 is fast and accurate; Qwen 2.5 14B AWQ is noticeably better on subtle B2B signal at roughly half the throughput (70 t/s vs 122 t/s).
Throughput
| Model | Per-lead time | Leads/hour (concurrent batch) |
|---|---|---|
| Phi-3 mini FP8 | 0.4 s | ~8,000 |
| Mistral 7B FP8 | 1.5 s | ~2,500-3,000 |
| Llama 3.1 8B FP8 | 1.8 s | ~2,000-2,500 |
| Qwen 2.5 14B AWQ | 2.8 s | ~1,200 |
For most B2B pipelines with 500-5,000 monthly leads, even Qwen 14B finishes the monthly batch in minutes. For high-volume consumer funnels with 100k+ daily leads, Phi-3 mini is the workhorse.
CRM integration
- HubSpot: webhook on contact create/update, write score to a custom property and trigger workflow on tier change
- Salesforce: Platform Events + Apex trigger, or scheduled batch Apex for bulk rescore
- Pipedrive: webhook on deal/person create, write to custom field
- Close, Copper, Zoho: REST API polling + webhooks
- Batch rescore: nightly cron that pulls the last 90 days of contacts, pipes through vLLM, writes scores back
Quality and calibration
Validate scores against closed-won data quarterly; expect 10-15 percent lift over rule-based scoring on most B2B funnels. Keep the final disposition human-owned: the model proposes a tier and a next action, the SDR confirms. Log prompt, response and human override so you can fine-tune a small specialist model later if volumes justify it.
Private LLM lead scoring
Structured JSON output on Blackwell 16GB. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: classification, internal tooling, customer support, SaaS RAG, FP8 Llama deployment.