Table of Contents
RAFT (Retrieval-Augmented Fine-Tuning) is a 2024-released pattern that teaches an LLM to handle the realistic RAG setting: retrieved passages include both relevant and irrelevant content. The model learns to cite relevant passages and ignore distractors. Net: better RAG quality without extra retrieval-time work.
Train data: per-question, retrieve N passages (mix of relevant + distractors); answer cites the relevant ones. Fine-tune on this. Result: model better at distinguishing relevant retrieved content from noise. ~+5-15% RAG quality lift on domain tasks. Pair with domain-fine-tuned embeddings for compounding wins.
How RAFT works
Standard RAG: retrieve top-K, stuff into prompt, LLM answers. Problem: when retrieved passages include irrelevant content, vanilla LLMs sometimes use the wrong passage and hallucinate.
RAFT training: deliberately mix relevant + distractor passages in training prompts. Teach the model:
- Cite specific passages that support each claim
- Ignore distractors that don't support the answer
- Refuse to answer when no passage genuinely supports the answer
The trained model handles the realistic noisy-retrieval setting better than vanilla.
Training data
Per training example:
- Question
- ~5 retrieved passages (1-2 truly relevant, 3-4 distractors)
- Answer with explicit citations to relevant passages
Generate via: existing RAG pipeline with hand-curated relevance labels, or LLM-as-judge to label retrievals at training-set creation time.
Recipe
Standard SFTTrainer + PEFT QLoRA, training data shaped as RAFT examples. ~5K-20K examples; 3-6 hours on a 4090. The recipe is identical to other instruction fine-tunes; the data shape is what makes it RAFT.
Verdict
RAFT is a strong fine-tuning pattern for production RAG, particularly for domains where retrieval imperfectly returns mixed relevant/irrelevant passages. Pair with domain-fine-tuned embeddings + reranker for compounding RAG quality wins. Cost: a few hours of GPU time + dataset curation; quality lift is real and durable.
Bottom line
RAFT for noisy-retrieval domains. See embedding fine-tuning.