RTX 3050 - Order Now
Home / Blog / Tutorials / Connect MinIO to GPU for Model Storage
Tutorials

Connect MinIO to GPU for Model Storage

Use MinIO as the model storage backend for your GPU AI infrastructure. This guide covers deploying MinIO alongside your inference server, configuring model downloads from S3-compatible storage, and automating model versioning on your dedicated GPU hardware.

What You’ll Connect

After this guide, your GPU inference server will pull AI models from a MinIO object storage instance with automated versioning, lifecycle policies, and S3-compatible access. MinIO runs alongside your vLLM or Ollama endpoint on dedicated GPU hardware, serving as a private model registry that training pipelines push to and inference servers pull from — no cloud storage dependencies required.

The integration organises models in MinIO buckets by name, version, and stage (staging vs production). Your training pipeline uploads new model checkpoints to MinIO, a promotion script moves validated models to the production path, and the inference server syncs the latest production model at startup or on demand.

Prerequisites

  • A GigaGPU server with 200GB+ storage for model files
  • Docker installed for running MinIO
  • A running inference endpoint (vLLM production guide)
  • Python 3.10+ with boto3 and minio packages

Integration Steps

Deploy MinIO as a Docker container on your GPU server or a nearby storage host. Expose the S3 API on port 9000 and the MinIO Console on port 9001 for web-based management. Configure dedicated storage volumes with fast I/O — model downloads benefit from high sequential read speeds.

Create a bucket structure that supports versioning and promotion. Use a naming convention like ai-models/model-name/staging/v{n}/ and ai-models/model-name/production/. Training pipelines push to staging, and a promotion script copies validated models to the production path. Your inference server always reads from the production path.

Configure your inference server to sync from MinIO at startup. A pre-start script checks the production path for the latest model version, downloads any new or updated files to local cache, and starts vLLM with the cached model. Subsequent restarts skip downloading unchanged files.

Code Example

Model management pipeline connecting training, MinIO, and your GPU inference server:

import boto3, os, hashlib

MINIO_URL = "http://minio:9000"
MINIO_KEY = os.environ["MINIO_ACCESS_KEY"]
MINIO_SECRET = os.environ["MINIO_SECRET_KEY"]
BUCKET = "ai-models"

s3 = boto3.client("s3", endpoint_url=MINIO_URL,
    aws_access_key_id=MINIO_KEY, aws_secret_access_key=MINIO_SECRET)

def upload_model(model_dir, model_name, version):
    """Upload trained model to MinIO staging."""
    prefix = f"{model_name}/staging/v{version}/"
    for root, _, files in os.walk(model_dir):
        for fname in files:
            local_path = os.path.join(root, fname)
            key = prefix + os.path.relpath(local_path, model_dir)
            s3.upload_file(local_path, BUCKET, key)
    print(f"Uploaded {model_name} v{version} to staging")

def promote_model(model_name, version):
    """Copy model from staging to production."""
    src_prefix = f"{model_name}/staging/v{version}/"
    dst_prefix = f"{model_name}/production/"
    # Clear current production
    paginator = s3.get_paginator("list_objects_v2")
    for page in paginator.paginate(Bucket=BUCKET, Prefix=dst_prefix):
        for obj in page.get("Contents", []):
            s3.delete_object(Bucket=BUCKET, Key=obj["Key"])
    # Copy staging to production
    for page in paginator.paginate(Bucket=BUCKET, Prefix=src_prefix):
        for obj in page.get("Contents", []):
            new_key = obj["Key"].replace(src_prefix, dst_prefix)
            s3.copy_object(Bucket=BUCKET, Key=new_key,
                CopySource={"Bucket": BUCKET, "Key": obj["Key"]})
    print(f"Promoted {model_name} v{version} to production")

def sync_production_model(model_name, local_cache="/data/models"):
    """Sync production model to local cache for inference."""
    prefix = f"{model_name}/production/"
    local_dir = os.path.join(local_cache, model_name)
    os.makedirs(local_dir, exist_ok=True)
    paginator = s3.get_paginator("list_objects_v2")
    for page in paginator.paginate(Bucket=BUCKET, Prefix=prefix):
        for obj in page.get("Contents", []):
            fname = obj["Key"].replace(prefix, "")
            local_path = os.path.join(local_dir, fname)
            if os.path.exists(local_path) and os.path.getsize(local_path) == obj["Size"]:
                continue
            os.makedirs(os.path.dirname(local_path), exist_ok=True)
            s3.download_file(BUCKET, obj["Key"], local_path)
    return local_dir

Testing Your Integration

Upload a small test model to MinIO staging and run the promotion script. Verify all files appear in the production path via the MinIO Console. Run the sync script and confirm files download to local cache. Run the sync again to verify incremental behaviour — already-downloaded files should be skipped.

Test the full workflow: upload a new model version, promote it, sync to the inference server, and restart vLLM. Verify the new model loads correctly by sending a test inference request through the OpenAI-compatible API.

Production Tips

Enable MinIO bucket versioning so you can roll back to any previous model version if a new one underperforms. Set lifecycle policies to automatically delete staging models older than 30 days while keeping production models indefinitely. For multi-server deployments, a central MinIO instance serves as the model registry for all GPU servers in your fleet.

Monitor MinIO storage usage and set alerts before running out of disk space — model files are large and accumulate quickly. Use MinIO’s erasure coding for data durability on multi-disk setups. For teams managing many models, build a model catalogue API that queries MinIO metadata. Explore more tutorials or get started with GigaGPU to build your private model registry.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?