Home / Blog / AI Hosting & Infrastructure / Firewall Config for AI

AI Hosting & Infrastructure

Firewall Config for AI

Configure firewalls for AI inference servers. Covers UFW and iptables rules, API endpoint protection, rate limiting, multi-GPU NCCL traffic, and securing model serving on dedicated GPU servers.

AI Hosting & Infrastructure April 16, 2026 3 min read admin

Your Inference API Is Open to the Entire Internet

You deployed vLLM on port 8000, opened the port, and walked away. Anyone who discovers the endpoint can drain your GPU compute for free, overload the server with requests, or probe for vulnerabilities in model responses. A GPU server running AI inference needs firewall rules that protect the API while allowing legitimate traffic, monitoring connections, and internal multi-GPU communication.

UFW: Simple Firewall for AI Servers

UFW provides readable firewall management on Ubuntu:

# Install and enable UFW
sudo apt install -y ufw

# Default policy: deny incoming, allow outgoing
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH (always do this BEFORE enabling UFW)
sudo ufw allow 22/tcp comment 'SSH'

# Allow inference API from specific IPs only
sudo ufw allow from 10.0.0.0/8 to any port 8000 proto tcp \
    comment 'vLLM API - internal'
sudo ufw allow from 203.0.113.50 to any port 8000 proto tcp \
    comment 'vLLM API - app server'

# Allow Ollama API (internal only)
sudo ufw allow from 10.0.0.0/8 to any port 11434 proto tcp \
    comment 'Ollama API - internal'

# Allow HTTPS reverse proxy
sudo ufw allow 443/tcp comment 'HTTPS'

# Allow monitoring (Prometheus, Grafana)
sudo ufw allow from 10.0.0.0/8 to any port 9090 proto tcp \
    comment 'Prometheus'

# Enable firewall
sudo ufw enable
sudo ufw status verbose

Multi-GPU and NCCL Traffic Rules

Distributed training and tensor parallelism require NCCL communication between GPUs:

# NCCL uses a range of ports for inter-node GPU communication
# For multi-node training, allow NCCL traffic between GPU servers

# NCCL port range (default)
sudo ufw allow from 10.0.1.0/24 to any port 29400:29500 proto tcp \
    comment 'NCCL multi-node'

# For InfiniBand/RoCE (RDMA)
sudo ufw allow from 10.0.1.0/24 to any port 4791 proto udp \
    comment 'RoCE v2'

# Allow PyTorch distributed (torch.distributed)
sudo ufw allow from 10.0.1.0/24 to any port 29500 proto tcp \
    comment 'PyTorch distributed master'

# If using NVLink within a single server, no firewall rules needed
# NVLink bypasses the network stack entirely

Rate Limiting with iptables

Prevent API abuse without a separate rate limiter:

# Rate limit inference API: max 30 connections per minute per IP
sudo iptables -A INPUT -p tcp --dport 8000 -m state --state NEW \
    -m recent --set --name INFERENCE
sudo iptables -A INPUT -p tcp --dport 8000 -m state --state NEW \
    -m recent --update --seconds 60 --hitcount 30 --name INFERENCE \
    -j DROP

# Connection limit: max 50 simultaneous connections per IP
sudo iptables -A INPUT -p tcp --dport 8000 \
    -m connlimit --connlimit-above 50 --connlimit-mask 32 \
    -j REJECT --reject-with tcp-reset

# Log dropped packets for debugging
sudo iptables -A INPUT -p tcp --dport 8000 -j LOG \
    --log-prefix "INFERENCE-DROPPED: " --log-level 4

# Save iptables rules to persist across reboots
sudo apt install -y iptables-persistent
sudo netfilter-persistent save

Nginx as a Security Layer

Place Nginx in front of the inference API for authentication and TLS:

# /etc/nginx/sites-available/inference
upstream vllm_backend {
    server 127.0.0.1:8000;
    keepalive 32;
}

server {
    listen 443 ssl;
    server_name inference.example.com;

    ssl_certificate /etc/letsencrypt/live/inference.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/inference.example.com/privkey.pem;

    # API key authentication
    location /v1/ {
        if ($http_authorization = "") { return 401; }
        if ($http_authorization != "Bearer YOUR_API_KEY") { return 403; }

        proxy_pass http://vllm_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_read_timeout 120s;
    }

    # Block direct model access
    location / { return 404; }
}

# With Nginx in front, bind vLLM to localhost only:
# vllm serve ... --host 127.0.0.1

Audit and Verify Firewall Rules

# List all active rules with numbers
sudo ufw status numbered

# Scan from outside to verify closed ports
# (run from a different machine)
nmap -p 1-65535 your-gpu-server-ip

# Check for listening ports that should not be exposed
sudo ss -tlnp | grep -v '127.0.0.1'

# Monitor connection attempts
sudo tail -f /var/log/ufw.log

# Test that API works from allowed IPs
curl -s -o /dev/null -w "%{http_code}" \
    https://inference.example.com/v1/models

Proper firewall configuration protects your GPU server without blocking legitimate AI traffic. For vLLM API setup, see the production guide. Secure Ollama endpoints similarly. Monitor access with our monitoring guide. Browse infrastructure articles, tutorials, and benchmarks for related configuration.

Secure GPU Infrastructure

GigaGPU dedicated servers with full root access. Configure firewalls, deploy AI models, and control access your way.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

AI Hosting & Infrastructure

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Firewall Config for AI

Your Inference API Is Open to the Entire Internet

UFW: Simple Firewall for AI Servers

Multi-GPU and NCCL Traffic Rules

Rate Limiting with iptables

Nginx as a Security Layer

Audit and Verify Firewall Rules

Secure GPU Infrastructure

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Firewall Config for AI

Your Inference API Is Open to the Entire Internet

UFW: Simple Firewall for AI Servers

Multi-GPU and NCCL Traffic Rules

Rate Limiting with iptables

Nginx as a Security Layer

Audit and Verify Firewall Rules

Secure GPU Infrastructure

Need a Dedicated GPU Server?

admin

Related Articles

GPU Server for 10 Concurrent LLM chatbot Users: Sizing Guide

SOC 2 Compliance for AI Hosting

Disk Offload vs CPU Offload for LLMs

What Is Dedicated GPU Hosting for AI (And Who Should Use It?)

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?