RTX 3050 - Order Now
Home / Blog / AI Hosting & Infrastructure / Docker Security for AI Workloads
AI Hosting & Infrastructure

Docker Security for AI Workloads

Harden Docker containers running AI inference with non-root users, read-only filesystems, GPU device isolation, image scanning, and runtime security for GPU servers.

Your team deploys AI models in Docker containers for reproducibility and isolation. But the default Docker configuration runs containers as root, mounts the Docker socket, and gives the container full access to all GPUs on the host. A prompt injection exploit that escapes the inference process now has root privileges inside the container, access to every GPU (including those serving other models), and potentially a path to the host system via the mounted socket. Container security for AI workloads requires deliberate hardening. This guide covers Docker security for inference on dedicated GPU servers.

Non-Root Container Execution

Never run inference containers as root. Create a dedicated user in your Dockerfile:

FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04

# Create non-root user for inference
RUN groupadd -r inference && useradd -r -g inference -d /app -s /sbin/nologin inference

# Install dependencies as root
RUN apt-get update && apt-get install -y python3 python3-pip && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
COPY --chown=inference:inference . .

# Switch to non-root user
USER inference
CMD ["python3", "serve.py"]

If the inference process is compromised, the attacker operates as an unprivileged user. They cannot install packages, modify system files, or access other containers’ data.

GPU Device Isolation

The NVIDIA Container Toolkit exposes GPUs to containers. By default, all GPUs are visible. Restrict each container to only the GPUs it needs:

FlagEffectUse Case
--gpus '"device=0"'Single GPU accessOne model per GPU
--gpus '"device=0,1"'Specific GPU pairTensor parallel model
--gpus allAll GPUs visibleAvoid — no isolation
NVIDIA_VISIBLE_DEVICES=noneNo GPU accessCPU-only preprocessing

For multi-tenant vLLM deployments, assign each model’s container a specific GPU. A compromised container cannot access another model’s GPU memory or weights.

Read-Only Filesystem and Volumes

Run containers with a read-only root filesystem. Mount writable volumes only where needed:

docker run -d \
  --name llm-inference \
  --gpus '"device=0"' \
  --read-only \
  --tmpfs /tmp:size=1G \
  -v /data/models:/models:ro \
  -v /data/logs:/app/logs:rw \
  --memory=32g \
  --memory-swap=32g \
  --pids-limit=256 \
  my-inference-image:latest

Model weights mount as read-only (:ro). Only the logs directory is writable. The --tmpfs provides scratch space that disappears when the container stops. Memory limits prevent a runaway process from consuming all host RAM, and --pids-limit blocks fork bombs. This applies equally to Ollama and other model serving containers.

Image Scanning and Supply Chain

AI Docker images pull from multiple sources: NVIDIA base images, PyPI packages, Hugging Face model weights. Each is an attack vector. Scan images before deployment with Trivy or Grype. Pin base image digests — not just tags — so rebuilds produce identical images. Verify model weight checksums after download. For private deployments, run a local container registry so production images never pull from public sources at runtime.

Avoid installing unnecessary packages. Every additional package increases the attack surface. A minimal inference container needs Python, the serving framework, and model dependencies — not build tools, editors, or debugging utilities.

Container Network Security

Isolate inference containers on dedicated Docker networks. Do not use the default bridge network. Create purpose-specific networks:

# Create isolated network for inference
docker network create --driver bridge --internal inference-net

# Inference container: internal only, no internet access
docker run -d --network inference-net --name llm my-inference-image

# API gateway: connected to both public and inference networks
docker run -d --network inference-net --network public-net --name gateway my-gateway-image

The --internal flag prevents containers on that network from reaching the internet. The inference container communicates only with the API gateway. This pattern protects models serving chatbots, document processing, and vision workloads.

Runtime Security Monitoring

Deploy runtime security monitoring that detects anomalous container behaviour: unexpected process execution (shells spawning inside inference containers), file writes to read-only paths (attempted exploits), network connections to unexpected destinations, and GPU utilisation patterns inconsistent with inference (cryptocurrency mining). Tools like Falco provide runtime threat detection with GPU-aware rules. Integrate alerts with your incident response plan. Review infrastructure security practices and GDPR compliance requirements for comprehensive container hardening.

Secure GPU Container Hosting

Dedicated GPU servers with NVIDIA Container Toolkit, full root access for Docker hardening, and network isolation. UK data centres.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?