RTX 3050 - Order Now
Home / Blog / Tutorials / Retrieval-Augmented Fine-Tuning (RAFT)
Tutorials

Retrieval-Augmented Fine-Tuning (RAFT)

RAFT teaches an LLM to ignore irrelevant retrieved passages and ground answers in relevant ones. Fine-tuning pattern that improves RAG quality.

RAFT (Retrieval-Augmented Fine-Tuning) is a 2024-released pattern that teaches an LLM to handle the realistic RAG setting: retrieved passages include both relevant and irrelevant content. The model learns to cite relevant passages and ignore distractors. Net: better RAG quality without extra retrieval-time work.

TL;DR

Train data: per-question, retrieve N passages (mix of relevant + distractors); answer cites the relevant ones. Fine-tune on this. Result: model better at distinguishing relevant retrieved content from noise. ~+5-15% RAG quality lift on domain tasks. Pair with domain-fine-tuned embeddings for compounding wins.

How RAFT works

Standard RAG: retrieve top-K, stuff into prompt, LLM answers. Problem: when retrieved passages include irrelevant content, vanilla LLMs sometimes use the wrong passage and hallucinate.

RAFT training: deliberately mix relevant + distractor passages in training prompts. Teach the model:

  • Cite specific passages that support each claim
  • Ignore distractors that don't support the answer
  • Refuse to answer when no passage genuinely supports the answer

The trained model handles the realistic noisy-retrieval setting better than vanilla.

Training data

Per training example:

  • Question
  • ~5 retrieved passages (1-2 truly relevant, 3-4 distractors)
  • Answer with explicit citations to relevant passages

Generate via: existing RAG pipeline with hand-curated relevance labels, or LLM-as-judge to label retrievals at training-set creation time.

Recipe

Standard SFTTrainer + PEFT QLoRA, training data shaped as RAFT examples. ~5K-20K examples; 3-6 hours on a 4090. The recipe is identical to other instruction fine-tunes; the data shape is what makes it RAFT.

Verdict

RAFT is a strong fine-tuning pattern for production RAG, particularly for domains where retrieval imperfectly returns mixed relevant/irrelevant passages. Pair with domain-fine-tuned embeddings + reranker for compounding RAG quality wins. Cost: a few hours of GPU time + dataset curation; quality lift is real and durable.

Bottom line

RAFT for noisy-retrieval domains. See embedding fine-tuning.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?