Home / Blog / Tutorials / ORPO vs DPO – Single-Stage vs Two-Stage Alignment

Tutorials

ORPO vs DPO – Single-Stage vs Two-Stage Alignment

ORPO combines SFT and preference optimisation into one stage. DPO runs them separately. Here is when each approach wins.

Tutorials April 19, 2026 2 min read gigagpu

DPO (Direct Preference Optimisation) is the de facto alignment step after SFT. ORPO (Odds Ratio Preference Optimisation) combines SFT and preference optimisation into one training run. On our dedicated GPU hosting both are viable – the right choice depends on your workflow.

DPO recap
ORPO
When each wins
Configuration

DPO

Two-stage. First SFT on target-domain data. Then DPO on preference pairs. Clean separation, well-understood, strong published results. Two training runs means two sets of hyperparameters to tune and two opportunities to waste GPU time.

ORPO

Single-stage. Train on preference pairs directly from a base (non-instruction-tuned) model. The ORPO loss combines a supervised signal on the chosen response with a preference term that pushes away from the rejected. One run, one set of hyperparameters.

When Each Wins

DPO wins when:

You already have a strong SFT checkpoint
Your preference data is limited (DPO with LoRA on 5k pairs works)
You want to iterate on alignment without re-doing SFT

ORPO wins when:

You are starting from a base model and want one training run
GPU budget is tight and doing SFT+DPO twice is expensive
You have a larger preference dataset (30k+ pairs)

Config

TRL supports both. ORPO via ORPOTrainer:

from trl import ORPOTrainer, ORPOConfig

trainer = ORPOTrainer(
    model=model,
    args=ORPOConfig(
        output_dir="./orpo-out",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        learning_rate=5e-6,
        beta=0.1,
        num_train_epochs=2,
        bf16=True,
    ),
    train_dataset=ds,
    tokenizer=tok,
)
trainer.train()

Dataset format is the same as DPO: prompt, chosen, rejected.

Single-Stage or Two-Stage Alignment

ORPO or DPO on UK dedicated GPU hosting, with sample datasets preloaded.

Browse GPU Servers

See DPO training for the two-stage path.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

ORPO vs DPO – Single-Stage vs Two-Stage Alignment

Contents

DPO

ORPO

When Each Wins

Config

Single-Stage or Two-Stage Alignment

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

ORPO vs DPO – Single-Stage vs Two-Stage Alignment

Contents

DPO

ORPO

When Each Wins

Config

Single-Stage or Two-Stage Alignment

Need a Dedicated GPU Server?

gigagpu

Related Articles

Self-Hosted AI Analytics: Logging, Metrics, and Cost Attribution

Backing Up LLM Model Weights Efficiently

cuDNN Error: Library Not Found or Version Mismatch

ColBERT v2 on a GPU Server – Late Interaction Retrieval

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?