RTX 3050 - Order Now
Home / Blog / Tutorials / QLoRA Fine-Tune on RTX 5060 Ti 16GB – Complete Guide
Tutorials

QLoRA Fine-Tune on RTX 5060 Ti 16GB – Complete Guide

QLoRA with bitsandbytes NF4 lets you fine-tune up to 14 B parameters on a 16 GB card - code, config and timing.

QLoRA quantises the frozen base model to 4-bit NF4 during training while keeping the trainable LoRA adapters in BF16. That buys roughly 4x the memory headroom over plain LoRA and makes 14B-class models trainable on a 16 GB card. On the RTX 5060 Ti 16GB via our dedicated GPU hosting, QLoRA on Qwen 2.5 14B comfortably fits and finishes overnight.

Contents

What QLoRA Changes

Three pieces compared to plain LoRA:

  • 4-bit NF4 base weights via bitsandbytes – a lossy but surprisingly effective quantisation format
  • Double quantisation – quantisation constants themselves quantised, shaving another ~0.4 GB on a 14B
  • Paged optimiser – AdamW state swaps to host RAM on OOM pressure, rare on a 16 GB card with LoRA-only optimiser state

VRAM Math – Qwen 2.5 14B

ComponentPlain LoRAQLoRA
Base weights~28 GB (BF16)~7.5 GB (4-bit NF4)
LoRA adapter~40 MB~40 MB
Optimiser state~160 MB~160 MB
Gradients~80 MB~80 MB
Activations (bs 1, 4k, checkpointed)~3 GB~3 GB
Buffer and kernels~1 GB~1 GB
Peak VRAM~32.3 GB (won’t fit)~11.8 GB (fits 16 GB)

Config

ParameterRecommended
Base modelQwen 2.5 14B Instruct, Llama 3.1 8B, Mistral Nemo 12B
Quantisation4-bit NF4, double-quant enabled
Compute dtypebfloat16
LoRA r / alpha16 / 32
Target modulesq, k, v, o, gate, up, down
Max seq length4096 (2048 for 14B if tight)
Batch size1-2, grad accum 8-16 (effective 16)
Learning rate1e-4 for 14B, 2e-4 for 7B
Gradient checkpointingUnsloth variant
Optimiserpaged_adamw_8bit

Training Code

from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig

model, tok = FastLanguageModel.from_pretrained(
    "unsloth/Qwen2.5-14B-Instruct",
    max_seq_length=4096,
    load_in_4bit=True,       # NF4
    dtype="bfloat16",
)

model = FastLanguageModel.get_peft_model(
    model, r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["q_proj","k_proj","v_proj","o_proj",
                    "gate_proj","up_proj","down_proj"],
    use_gradient_checkpointing="unsloth",
    random_state=42,
)

SFTTrainer(
    model=model, tokenizer=tok,
    train_dataset=train_ds, eval_dataset=eval_ds,
    args=SFTConfig(
        output_dir="./qwen14b-qlora",
        num_train_epochs=3,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=16,
        learning_rate=1e-4,
        bf16=True,
        optim="paged_adamw_8bit",
        logging_steps=10, eval_strategy="epoch",
    ),
).train()

Expected Time

ModelDatasetTokens/secTime per epoch (2 M tokens)
Llama 3.1 8B2,000 ex~4,100~8 min
Mistral Nemo 12B2,000 ex~3,100~11 min
Qwen 2.5 14B2,000 ex~2,600~13 min
Qwen 2.5 14B20,000 ex (20 M tokens)~2,600~2 h per epoch, ~6 h for 3 epochs

QLoRA vs LoRA – When to Use Which

  • QLoRA – when you need to fine-tune a larger base than VRAM allows, or when you want headroom for longer sequences / bigger batches
  • LoRA – when the base fits in BF16 and you want maximum training quality and speed (NF4 adds slight noise)
  • Quality delta – typically <1% eval-loss difference; QLoRA is production-acceptable for almost all tasks
  • Merge caveat – QLoRA adapters merge back into a BF16 base; the NF4 quantisation is training-time only

QLoRA on Blackwell 16 GB

Fine-tune up to Qwen 14B overnight on one card. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: QLoRA training speed, LoRA guide, Unsloth speed, fine-tune throughput, vLLM setup.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?