Table of Contents
For RAG production, explainability matters: users need to verify AI claims; auditors need to trace outputs to sources. The right pattern is forcing the LLM to cite specific retrieved chunks for each claim. Combined with structured outputs and verification, this gives auditable AI.
Pattern: assign IDs to retrieved chunks; require LLM to cite chunk IDs per claim in structured-output JSON. Verify post-hoc that citations point to chunks that actually support the claims (LLM-as-judge or rule-based). 90-95% citation accuracy achievable; the 5-10% slips are typically detectable. Critical for regulated / customer-facing RAG.
Why citations
- User trust: verifiable claims increase confidence
- Audit trail: regulators / auditors can trace outputs to sources
- Hallucination detection: claims without citations are likely hallucination
- Correction loop: when citation doesn't support claim, you have signal for prompt or model improvement
- Liability: in regulated domains, sourcing matters legally
Patterns
Standard citation pattern:
chunks_with_ids = [
{"id": "doc-42-chunk-3", "text": "..."},
{"id": "doc-17-chunk-1", "text": "..."},
]
prompt = f"""Answer the user's question using only the provided sources.
For each factual claim, cite the source ID.
Output JSON: {{"answer": "...", "claims": [{{"claim": "...", "citation": "..."}}, ...]}}
Sources:
{format_chunks(chunks_with_ids)}
Question: {user_query}"""
Use vLLM's guided JSON output with a Pydantic schema enforcing claim+citation structure.
Verification
Post-hoc citation verification:
- For each claim+citation pair, verify citation ID is in retrieved set
- For each claim+citation pair, verify cited chunk actually supports the claim (LLM-as-judge)
- Track citation accuracy over time as a quality metric
- Hallucinated citations (claim without valid source) are red flags
Verdict
For RAG in regulated / customer-facing / high-stakes domains, output citations are essential. The pattern is straightforward; verification adds discipline. ~90-95% citation accuracy is achievable; the gap is the boundary worth investing in.
Bottom line
Force citations; verify; track accuracy. See RAG metrics.