Table of Contents
YOLOv8 VRAM Requirements Overview
Ultralytics YOLOv8 is the go-to model for real-time object detection, segmentation, classification, and pose estimation. The good news: YOLOv8 is extremely VRAM-efficient. Even the largest XLarge variant uses under 1 GB for model weights at FP16. The real VRAM consumption comes from batch processing and input resolution. This guide covers every YOLOv8 variant for choosing the right dedicated GPU server for vision model hosting.
YOLOv8 comes in five sizes (Nano, Small, Medium, Large, XLarge) and four task variants (Detection, Segmentation, Pose, Classification). All are designed for real-time inference, making them far less GPU-intensive than large language models or image generators.
Complete VRAM Table (All Models)
YOLOv8 Detection Models (Single Image, 640×640)
| Model | Parameters | FP32 VRAM | FP16 VRAM | INT8 VRAM |
|---|---|---|---|---|
| YOLOv8n (Nano) | 3.2M | ~0.3 GB | ~0.15 GB | ~0.1 GB |
| YOLOv8s (Small) | 11.2M | ~0.5 GB | ~0.25 GB | ~0.15 GB |
| YOLOv8m (Medium) | 25.9M | ~0.9 GB | ~0.5 GB | ~0.3 GB |
| YOLOv8l (Large) | 43.7M | ~1.4 GB | ~0.7 GB | ~0.4 GB |
| YOLOv8x (XLarge) | 68.2M | ~2 GB | ~1 GB | ~0.6 GB |
YOLOv8 Segmentation Models (Single Image, 640×640)
| Model | Parameters | FP32 VRAM | FP16 VRAM | INT8 VRAM |
|---|---|---|---|---|
| YOLOv8n-seg | 3.4M | ~0.4 GB | ~0.2 GB | ~0.12 GB |
| YOLOv8s-seg | 11.8M | ~0.6 GB | ~0.3 GB | ~0.18 GB |
| YOLOv8m-seg | 27.3M | ~1.1 GB | ~0.6 GB | ~0.35 GB |
| YOLOv8l-seg | 46.0M | ~1.6 GB | ~0.8 GB | ~0.5 GB |
| YOLOv8x-seg | 71.8M | ~2.3 GB | ~1.2 GB | ~0.7 GB |
YOLOv8 Pose Estimation Models (Single Image, 640×640)
| Model | Parameters | FP32 VRAM | FP16 VRAM | INT8 VRAM |
|---|---|---|---|---|
| YOLOv8n-pose | 3.3M | ~0.35 GB | ~0.18 GB | ~0.11 GB |
| YOLOv8s-pose | 11.6M | ~0.55 GB | ~0.28 GB | ~0.17 GB |
| YOLOv8m-pose | 26.4M | ~1 GB | ~0.55 GB | ~0.32 GB |
| YOLOv8l-pose | 44.4M | ~1.5 GB | ~0.75 GB | ~0.45 GB |
| YOLOv8x-pose | 69.4M | ~2.1 GB | ~1.05 GB | ~0.65 GB |
YOLOv8 models are tiny compared to LLMs. Even the XLarge variant at FP32 uses just 2 GB. VRAM bottlenecks come from batch processing and high-resolution input, not model weights. For VRAM-hungry workloads like LLMs, see our LLaMA 3 VRAM requirements or Stable Diffusion VRAM requirements pages.
Which GPU Do You Need?
| GPU | VRAM | Best YOLOv8 Config | FPS (640×640) | Use Case |
|---|---|---|---|---|
| RTX 3050 | 8 GB | Any size, batch 16-32 | 200-400+ (n-m) | Real-time + batch |
| RTX 4060 | 8 GB | Any size, batch 32-64 | 300-500+ (n-m) | Multi-stream |
| RTX 4060 Ti | 16 GB | XLarge, batch 64+ | 400-600+ (n-m) | High throughput |
| RTX 3090 | 24 GB | XLarge, batch 128+ | 500-800+ (n-m) | Max throughput |
Every GPU on this list can run any YOLOv8 model comfortably. The choice depends on throughput needs (batch size and FPS requirements), not whether the model fits.
Resolution Impact on VRAM
Input resolution has a major impact on VRAM during inference:
| Resolution | YOLOv8n (FP16) | YOLOv8m (FP16) | YOLOv8x (FP16) |
|---|---|---|---|
| 320×320 | ~0.1 GB | ~0.3 GB | ~0.5 GB |
| 640×640 | ~0.15 GB | ~0.5 GB | ~1 GB |
| 1280×1280 | ~0.5 GB | ~1.5 GB | ~3.5 GB |
| 1920×1920 | ~1 GB | ~3 GB | ~7 GB |
| 3840×3840 | ~3.5 GB | ~10 GB | ~24 GB |
At 4K resolution, even YOLOv8x uses significant VRAM. For high-resolution processing, use tiled inference or stick with 640×640 (the training resolution) for best accuracy-per-compute.
Batch Size Impact on VRAM
Batching is how you maximize GPU utilization with YOLO. Here is the VRAM for different batch sizes:
| Model (FP16, 640×640) | Batch 1 | Batch 8 | Batch 32 | Batch 64 | Batch 128 |
|---|---|---|---|---|---|
| YOLOv8n | ~0.15 GB | ~0.4 GB | ~1.2 GB | ~2.2 GB | ~4.2 GB |
| YOLOv8m | ~0.5 GB | ~1 GB | ~2.5 GB | ~4.5 GB | ~8.5 GB |
| YOLOv8x | ~1 GB | ~1.8 GB | ~4 GB | ~7 GB | ~13 GB |
On an RTX 3090 (24 GB), you can batch 128 images with YOLOv8x or 300+ with YOLOv8n. This makes batch processing of video frames or image datasets extremely fast.
Practical Deployment Recommendations
- Real-time single camera: Any GPU works. Even an RTX 3050 with YOLOv8n achieves 400+ FPS at 640×640.
- Multi-camera (4-8 streams): RTX 4060 with YOLOv8s. Process all streams with room to spare.
- High-accuracy detection: RTX 4060 Ti with YOLOv8x. XLarge model for best mAP at real-time speeds.
- Video processing pipeline: RTX 3090 with large batch sizes. Process video at 10-50x real-time depending on model size.
- Training/fine-tuning: RTX 3090 minimum. Training requires significantly more VRAM than inference (3-5x). Use the XLarge model only with 24+ GB for training.
YOLOv8 is one of the most GPU-efficient AI workloads. For cost analysis, see our cheapest GPU for AI inference guide.
Quick Setup Commands
Ultralytics CLI
# Install and run inference
pip install ultralytics
# Detection on image
yolo detect predict model=yolov8m.pt source=image.jpg device=0
# Segmentation on video
yolo segment predict model=yolov8m-seg.pt source=video.mp4 device=0
# Export to TensorRT for maximum speed
yolo export model=yolov8m.pt format=engine device=0 half=True
Python API
from ultralytics import YOLO
# Load model
model = YOLO('yolov8m.pt')
# Run inference
results = model('image.jpg', device=0)
# Process video with batch
results = model('video.mp4', device=0, stream=True, batch=16)
# Export to TensorRT FP16
model.export(format='engine', device=0, half=True)
TensorRT for Production
# Export to TensorRT for 2-3x speedup
yolo export model=yolov8m.pt format=engine device=0 half=True
# Run with TensorRT engine
yolo detect predict model=yolov8m.engine source=video.mp4 device=0
TensorRT export provides 2-3x faster inference compared to PyTorch. Always use TensorRT for production deployments. For GPU comparisons, see our GPU comparison tool and best GPU for AI inference guide. Also check our RTX 3090 vs 5090 comparison for vision workloads.
Deploy This Model Now
Dedicated GPU servers with the VRAM you need. UK datacenter, full root access.
Browse GPU Servers