What You’ll Connect
After this guide, your GPU inference server will pull AI models from a MinIO object storage instance with automated versioning, lifecycle policies, and S3-compatible access. MinIO runs alongside your vLLM or Ollama endpoint on dedicated GPU hardware, serving as a private model registry that training pipelines push to and inference servers pull from — no cloud storage dependencies required.
The integration organises models in MinIO buckets by name, version, and stage (staging vs production). Your training pipeline uploads new model checkpoints to MinIO, a promotion script moves validated models to the production path, and the inference server syncs the latest production model at startup or on demand.
Prerequisites
- A GigaGPU server with 200GB+ storage for model files
- Docker installed for running MinIO
- A running inference endpoint (vLLM production guide)
- Python 3.10+ with
boto3andminiopackages
Integration Steps
Deploy MinIO as a Docker container on your GPU server or a nearby storage host. Expose the S3 API on port 9000 and the MinIO Console on port 9001 for web-based management. Configure dedicated storage volumes with fast I/O — model downloads benefit from high sequential read speeds.
Create a bucket structure that supports versioning and promotion. Use a naming convention like ai-models/model-name/staging/v{n}/ and ai-models/model-name/production/. Training pipelines push to staging, and a promotion script copies validated models to the production path. Your inference server always reads from the production path.
Configure your inference server to sync from MinIO at startup. A pre-start script checks the production path for the latest model version, downloads any new or updated files to local cache, and starts vLLM with the cached model. Subsequent restarts skip downloading unchanged files.
Code Example
Model management pipeline connecting training, MinIO, and your GPU inference server:
import boto3, os, hashlib
MINIO_URL = "http://minio:9000"
MINIO_KEY = os.environ["MINIO_ACCESS_KEY"]
MINIO_SECRET = os.environ["MINIO_SECRET_KEY"]
BUCKET = "ai-models"
s3 = boto3.client("s3", endpoint_url=MINIO_URL,
aws_access_key_id=MINIO_KEY, aws_secret_access_key=MINIO_SECRET)
def upload_model(model_dir, model_name, version):
"""Upload trained model to MinIO staging."""
prefix = f"{model_name}/staging/v{version}/"
for root, _, files in os.walk(model_dir):
for fname in files:
local_path = os.path.join(root, fname)
key = prefix + os.path.relpath(local_path, model_dir)
s3.upload_file(local_path, BUCKET, key)
print(f"Uploaded {model_name} v{version} to staging")
def promote_model(model_name, version):
"""Copy model from staging to production."""
src_prefix = f"{model_name}/staging/v{version}/"
dst_prefix = f"{model_name}/production/"
# Clear current production
paginator = s3.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=BUCKET, Prefix=dst_prefix):
for obj in page.get("Contents", []):
s3.delete_object(Bucket=BUCKET, Key=obj["Key"])
# Copy staging to production
for page in paginator.paginate(Bucket=BUCKET, Prefix=src_prefix):
for obj in page.get("Contents", []):
new_key = obj["Key"].replace(src_prefix, dst_prefix)
s3.copy_object(Bucket=BUCKET, Key=new_key,
CopySource={"Bucket": BUCKET, "Key": obj["Key"]})
print(f"Promoted {model_name} v{version} to production")
def sync_production_model(model_name, local_cache="/data/models"):
"""Sync production model to local cache for inference."""
prefix = f"{model_name}/production/"
local_dir = os.path.join(local_cache, model_name)
os.makedirs(local_dir, exist_ok=True)
paginator = s3.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=BUCKET, Prefix=prefix):
for obj in page.get("Contents", []):
fname = obj["Key"].replace(prefix, "")
local_path = os.path.join(local_dir, fname)
if os.path.exists(local_path) and os.path.getsize(local_path) == obj["Size"]:
continue
os.makedirs(os.path.dirname(local_path), exist_ok=True)
s3.download_file(BUCKET, obj["Key"], local_path)
return local_dir
Testing Your Integration
Upload a small test model to MinIO staging and run the promotion script. Verify all files appear in the production path via the MinIO Console. Run the sync script and confirm files download to local cache. Run the sync again to verify incremental behaviour — already-downloaded files should be skipped.
Test the full workflow: upload a new model version, promote it, sync to the inference server, and restart vLLM. Verify the new model loads correctly by sending a test inference request through the OpenAI-compatible API.
Production Tips
Enable MinIO bucket versioning so you can roll back to any previous model version if a new one underperforms. Set lifecycle policies to automatically delete staging models older than 30 days while keeping production models indefinitely. For multi-server deployments, a central MinIO instance serves as the model registry for all GPU servers in your fleet.
Monitor MinIO storage usage and set alerts before running out of disk space — model files are large and accumulate quickly. Use MinIO’s erasure coding for data durability on multi-disk setups. For teams managing many models, build a model catalogue API that queries MinIO metadata. Explore more tutorials or get started with GigaGPU to build your private model registry.