Home / Blog / Tutorials / Retrieval-Augmented Fine-Tuning (RAFT)

Tutorials

Retrieval-Augmented Fine-Tuning (RAFT)

RAFT teaches an LLM to ignore irrelevant retrieved passages and ground answers in relevant ones. Fine-tuning pattern that improves RAG quality.

Tutorials May 6, 2026 2 min read gigagpu

Table of Contents

RAFT (Retrieval-Augmented Fine-Tuning) is a 2024-released pattern that teaches an LLM to handle the realistic RAG setting: retrieved passages include both relevant and irrelevant content. The model learns to cite relevant passages and ignore distractors. Net: better RAG quality without extra retrieval-time work.

TL;DR

Train data: per-question, retrieve N passages (mix of relevant + distractors); answer cites the relevant ones. Fine-tune on this. Result: model better at distinguishing relevant retrieved content from noise. ~+5-15% RAG quality lift on domain tasks. Pair with domain-fine-tuned embeddings for compounding wins.

How RAFT works

Standard RAG: retrieve top-K, stuff into prompt, LLM answers. Problem: when retrieved passages include irrelevant content, vanilla LLMs sometimes use the wrong passage and hallucinate.

RAFT training: deliberately mix relevant + distractor passages in training prompts. Teach the model:

Cite specific passages that support each claim
Ignore distractors that don't support the answer
Refuse to answer when no passage genuinely supports the answer

The trained model handles the realistic noisy-retrieval setting better than vanilla.

Training data

Per training example:

Question
~5 retrieved passages (1-2 truly relevant, 3-4 distractors)
Answer with explicit citations to relevant passages

Generate via: existing RAG pipeline with hand-curated relevance labels, or LLM-as-judge to label retrievals at training-set creation time.

Recipe

Standard SFTTrainer + PEFT QLoRA, training data shaped as RAFT examples. ~5K-20K examples; 3-6 hours on a 4090. The recipe is identical to other instruction fine-tunes; the data shape is what makes it RAFT.

Verdict

RAFT is a strong fine-tuning pattern for production RAG, particularly for domains where retrieval imperfectly returns mixed relevant/irrelevant passages. Pair with domain-fine-tuned embeddings + reranker for compounding RAG quality wins. Cost: a few hours of GPU time + dataset curation; quality lift is real and durable.

Bottom line

RAFT for noisy-retrieval domains. See embedding fine-tuning.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Retrieval-Augmented Fine-Tuning (RAFT)

How RAFT works

Training data

Recipe

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Retrieval-Augmented Fine-Tuning (RAFT)

How RAFT works

Training data

Recipe

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

Related Articles

Whisper Language Detection Wrong: Fix

Best RAG Frameworks in 2026 (Updated April 2026)

vLLM Behind nginx With Auth

AI Inference: Batch Throughput vs Latency Trade-Off Explained

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?