Home / Blog / Tutorials / Connect MinIO to GPU for Model Storage

Tutorials

Connect MinIO to GPU for Model Storage

Use MinIO as the model storage backend for your GPU AI infrastructure. This guide covers deploying MinIO alongside your inference server, configuring model downloads from S3-compatible storage, and automating model versioning on your dedicated GPU hardware.

Tutorials April 16, 2026 4 min read admin

What You’ll Connect

After this guide, your GPU inference server will pull AI models from a MinIO object storage instance with automated versioning, lifecycle policies, and S3-compatible access. MinIO runs alongside your vLLM or Ollama endpoint on dedicated GPU hardware, serving as a private model registry that training pipelines push to and inference servers pull from — no cloud storage dependencies required.

The integration organises models in MinIO buckets by name, version, and stage (staging vs production). Your training pipeline uploads new model checkpoints to MinIO, a promotion script moves validated models to the production path, and the inference server syncs the latest production model at startup or on demand.

Prerequisites

A GigaGPU server with 200GB+ storage for model files
Docker installed for running MinIO
A running inference endpoint (vLLM production guide)
Python 3.10+ with boto3 and minio packages

Integration Steps

Deploy MinIO as a Docker container on your GPU server or a nearby storage host. Expose the S3 API on port 9000 and the MinIO Console on port 9001 for web-based management. Configure dedicated storage volumes with fast I/O — model downloads benefit from high sequential read speeds.

Create a bucket structure that supports versioning and promotion. Use a naming convention like ai-models/model-name/staging/v{n}/ and ai-models/model-name/production/. Training pipelines push to staging, and a promotion script copies validated models to the production path. Your inference server always reads from the production path.

Configure your inference server to sync from MinIO at startup. A pre-start script checks the production path for the latest model version, downloads any new or updated files to local cache, and starts vLLM with the cached model. Subsequent restarts skip downloading unchanged files.

Code Example

Model management pipeline connecting training, MinIO, and your GPU inference server:

import boto3, os, hashlib

MINIO_URL = "http://minio:9000"
MINIO_KEY = os.environ["MINIO_ACCESS_KEY"]
MINIO_SECRET = os.environ["MINIO_SECRET_KEY"]
BUCKET = "ai-models"

s3 = boto3.client("s3", endpoint_url=MINIO_URL,
    aws_access_key_id=MINIO_KEY, aws_secret_access_key=MINIO_SECRET)

def upload_model(model_dir, model_name, version):
    """Upload trained model to MinIO staging."""
    prefix = f"{model_name}/staging/v{version}/"
    for root, _, files in os.walk(model_dir):
        for fname in files:
            local_path = os.path.join(root, fname)
            key = prefix + os.path.relpath(local_path, model_dir)
            s3.upload_file(local_path, BUCKET, key)
    print(f"Uploaded {model_name} v{version} to staging")

def promote_model(model_name, version):
    """Copy model from staging to production."""
    src_prefix = f"{model_name}/staging/v{version}/"
    dst_prefix = f"{model_name}/production/"
    # Clear current production
    paginator = s3.get_paginator("list_objects_v2")
    for page in paginator.paginate(Bucket=BUCKET, Prefix=dst_prefix):
        for obj in page.get("Contents", []):
            s3.delete_object(Bucket=BUCKET, Key=obj["Key"])
    # Copy staging to production
    for page in paginator.paginate(Bucket=BUCKET, Prefix=src_prefix):
        for obj in page.get("Contents", []):
            new_key = obj["Key"].replace(src_prefix, dst_prefix)
            s3.copy_object(Bucket=BUCKET, Key=new_key,
                CopySource={"Bucket": BUCKET, "Key": obj["Key"]})
    print(f"Promoted {model_name} v{version} to production")

def sync_production_model(model_name, local_cache="/data/models"):
    """Sync production model to local cache for inference."""
    prefix = f"{model_name}/production/"
    local_dir = os.path.join(local_cache, model_name)
    os.makedirs(local_dir, exist_ok=True)
    paginator = s3.get_paginator("list_objects_v2")
    for page in paginator.paginate(Bucket=BUCKET, Prefix=prefix):
        for obj in page.get("Contents", []):
            fname = obj["Key"].replace(prefix, "")
            local_path = os.path.join(local_dir, fname)
            if os.path.exists(local_path) and os.path.getsize(local_path) == obj["Size"]:
                continue
            os.makedirs(os.path.dirname(local_path), exist_ok=True)
            s3.download_file(BUCKET, obj["Key"], local_path)
    return local_dir

Testing Your Integration

Upload a small test model to MinIO staging and run the promotion script. Verify all files appear in the production path via the MinIO Console. Run the sync script and confirm files download to local cache. Run the sync again to verify incremental behaviour — already-downloaded files should be skipped.

Test the full workflow: upload a new model version, promote it, sync to the inference server, and restart vLLM. Verify the new model loads correctly by sending a test inference request through the OpenAI-compatible API.

Production Tips

Enable MinIO bucket versioning so you can roll back to any previous model version if a new one underperforms. Set lifecycle policies to automatically delete staging models older than 30 days while keeping production models indefinitely. For multi-server deployments, a central MinIO instance serves as the model registry for all GPU servers in your fleet.

Monitor MinIO storage usage and set alerts before running out of disk space — model files are large and accumulate quickly. Use MinIO’s erasure coding for data durability on multi-disk setups. For teams managing many models, build a model catalogue API that queries MinIO metadata. Explore more tutorials or get started with GigaGPU to build your private model registry.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Connect MinIO to GPU for Model Storage

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Connect MinIO to GPU for Model Storage

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

admin

Related Articles

QLoRA Fine-Tuning Llama 3.3 70B on RTX 5090

PyTorch CUDA Version Compatibility Matrix

AutoGen vs CrewAI vs LangGraph: 2026

LoRA Loading Errors in Stable Diffusion: Fix

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?