Home / Blog / Tutorials / Fine-Tuning VRAM Calculator: How Much Do You Need?

Tutorials

Fine-Tuning VRAM Calculator: How Much Do You Need?

A practical VRAM calculator for LLM fine-tuning covering LoRA, QLoRA, and full fine-tuning across model sizes from 7B to 70B, with GPU recommendations.

Tutorials April 17, 2026 3 min read admin

Table of Contents

Why VRAM Calculation Matters
The VRAM Formula
Quick Reference Tables
Variables That Change Your VRAM
GPU Mapping Guide
Conclusion

Why VRAM Calculation Matters

Running out of VRAM mid-training wastes hours and money. Before spinning up a dedicated GPU server for fine-tuning, you need to know exactly how much memory your training run will require. The VRAM footprint depends on the model size, fine-tuning method, sequence length, batch size, and precision — and the interactions between these variables are not always intuitive.

This guide provides formulas and reference tables so you can calculate your VRAM needs before committing to hardware. For method-specific details, see our LoRA vs QLoRA vs full fine-tuning comparison.

The VRAM Formula

Total training VRAM has four components:

Total VRAM = Model Weights + Optimiser States + Gradients + Activations

Component	Full FT (FP16)	LoRA (FP16)	QLoRA (INT4)
Model weights	2 bytes x params	2 bytes x params	0.5 bytes x params
Optimiser states	8 bytes x params	8 bytes x trainable	8 bytes x trainable
Gradients	2 bytes x params	2 bytes x trainable	2 bytes x trainable
Activations	~2-4 GB (varies)	~1-2 GB	~1-2 GB

For full fine-tuning, the optimiser (Adam) stores first and second moment estimates (8 bytes per parameter), making it the dominant cost. LoRA and QLoRA only compute optimiser states for the trainable adapter parameters (typically 0.1-1% of total), which is why they use dramatically less VRAM.

Example calculation for LLaMA 3 8B full fine-tuning:

Model weights: 8B x 2 bytes = 16 GB
Optimiser states: 8B x 8 bytes = 64 GB
Gradients: 8B x 2 bytes = 16 GB
Activations: ~4 GB
Total: ~100 GB (with gradient checkpointing reducing activations)

Quick Reference Tables

Pre-calculated VRAM for common model sizes. Assumes sequence length 512, batch size 4 (via gradient accumulation), LoRA rank 16. Measured on GigaGPU servers with PyTorch.

QLoRA VRAM (most common method)

Model	Params	Seq 512	Seq 1024	Seq 2048
Mistral 7B	7.2B	~13 GB	~15 GB	~19 GB
LLaMA 3 8B	8.0B	~14 GB	~16 GB	~21 GB
Qwen 2.5 14B	14.2B	~22 GB	~26 GB	~33 GB
LLaMA 3 70B	70.6B	~52 GB	~60 GB	~78 GB
Qwen 2.5 72B	72.7B	~54 GB	~62 GB	~80 GB

LoRA (FP16) VRAM

Model	Params	Seq 512	Seq 1024	Seq 2048
Mistral 7B	7.2B	~26 GB	~30 GB	~38 GB
LLaMA 3 8B	8.0B	~28 GB	~33 GB	~42 GB
Qwen 2.5 14B	14.2B	~45 GB	~52 GB	~66 GB
LLaMA 3 70B	70.6B	~180 GB	~200 GB	~240 GB

Full Fine-Tuning VRAM

Model	Params	Seq 512 (w/ grad ckpt)	Without Grad Ckpt
Mistral 7B	7.2B	~72 GB	~90 GB
LLaMA 3 8B	8.0B	~80 GB	~100 GB
LLaMA 3 70B	70.6B	~700 GB	~900 GB

Variables That Change Your VRAM

Sequence length: longer sequences increase activation memory. Doubling from 512 to 1024 adds roughly 15-25% to total VRAM. Use gradient checkpointing to reduce this at the cost of ~20% slower training.
Batch size: each additional sample in the batch stores its own activations. Use gradient accumulation (small per-GPU batch, many accumulation steps) to simulate large batches without proportionally increasing VRAM.
LoRA rank: higher rank means more trainable parameters. Rank 16 uses ~0.25% of params; rank 64 uses ~1%. Doubling rank roughly doubles the adapter VRAM but not the base model VRAM.
Target modules: applying LoRA to MLP layers (gate, up, down projections) in addition to attention roughly doubles the trainable parameter count and VRAM for adapters.
Optimiser: Adam uses 8 bytes/param. AdaFactor uses ~4 bytes/param. SGD uses ~4 bytes/param. Switching from Adam to 8-bit Adam (bitsandbytes) halves optimiser memory.

GPU Mapping Guide

VRAM Budget	GPU Options	What You Can Fine-Tune
8-10 GB	RTX 4060	7-8B QLoRA (r=8, seq 512)
16 GB	RTX 4060 Ti, RTX 5080	7-8B QLoRA (r=64, seq 1024)
24 GB	RTX 3090	7-8B LoRA (FP16), 14B QLoRA
32 GB	RTX 5090	14B LoRA (FP16), 70B QLoRA (tight)
48 GB	2x RTX 3090	14B full FT (tight), 70B QLoRA
64 GB	2x RTX 5090	70B QLoRA comfortably
80 GB	RTX 6000 Pro	7-8B full FT, 70B QLoRA with headroom
160+ GB	Multi-GPU cluster	70B LoRA (FP16), 70B full FT

For specific model guides, see LLaMA 3 8B fine-tuning, Mistral 7B fine-tuning, and DeepSeek fine-tuning. Browse all tutorials in the Tutorials category.

Conclusion

VRAM planning prevents wasted time and money. QLoRA is the most accessible method, putting 7B model fine-tuning within reach of 16 GB GPUs and 70B models within 64 GB. LoRA at FP16 requires roughly double the VRAM but produces marginally higher-quality adapters. Full fine-tuning demands 8-10x the model weight size and is practical only on RTX 6000 Pro/RTX 6000 Pro-class hardware. Calculate before you deploy, and you will pick the right GPU server the first time.

Get the Right GPU for Fine-Tuning

Dedicated servers from 8 GB to multi-GPU clusters. Pre-configured with PyTorch, CUDA, and PEFT libraries.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Fine-Tuning VRAM Calculator: How Much Do You Need?

Why VRAM Calculation Matters

The VRAM Formula

Quick Reference Tables

QLoRA VRAM (most common method)

LoRA (FP16) VRAM

Full Fine-Tuning VRAM

Variables That Change Your VRAM

GPU Mapping Guide

Conclusion

Get the Right GPU for Fine-Tuning

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Fine-Tuning VRAM Calculator: How Much Do You Need?

Why VRAM Calculation Matters

The VRAM Formula

Quick Reference Tables

QLoRA VRAM (most common method)

LoRA (FP16) VRAM

Full Fine-Tuning VRAM

Variables That Change Your VRAM

GPU Mapping Guide

Conclusion

Get the Right GPU for Fine-Tuning

Need a Dedicated GPU Server?

admin

Related Articles

Migrate from AWS Bedrock to Dedicated GPU: Document Processing Guide

Connect Hugging Face Hub to GPU for Model Sync

GPU Utilisation Guide: Hit 90%+ Efficiency

Connect Firebase to Self-Hosted AI on GPU

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?