Home / Blog / Tutorials / vLLM Behind nginx With Auth

Tutorials

vLLM Behind nginx With Auth

A complete auth-protected vLLM setup: TLS, API keys, per-key rate limits. Production-grade access control without writing app code.

Tutorials April 23, 2026 2 min read admin

Exposing vLLM directly means anyone with the URL can hit it. On dedicated GPU hosting nginx sits in front, enforces authentication, applies per-key rate limits, and logs access. Here is the complete config.

Structure
API key table
nginx config
Key rotation

Structure

Clients send Authorization: Bearer sk-.... nginx uses map directives to look up valid keys and their rate limit tier. Valid keys proxy to vLLM; invalid get 401.

Keys

Store keys in a plain file that nginx reads at reload:

# /etc/nginx/api_keys.conf
map $http_authorization $api_key_valid {
    default 0;
    "Bearer sk-customer-a-xyz" 1;
    "Bearer sk-customer-b-abc" 1;
    "Bearer sk-internal-ops"   1;
}

map $http_authorization $rate_limit_key {
    default "";
    "Bearer sk-customer-a-xyz" "a";
    "Bearer sk-customer-b-abc" "b";
    "Bearer sk-internal-ops"   "internal";
}

Config

limit_req_zone $rate_limit_key zone=keyed:10m rate=60r/m;

server {
    listen 443 ssl http2;
    server_name api.yourdomain.com;

    include /etc/nginx/api_keys.conf;

    location /v1/ {
        if ($api_key_valid = 0) { return 401; }
        limit_req zone=keyed burst=20 nodelay;

        proxy_pass http://127.0.0.1:8000;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_buffering off;
        proxy_read_timeout 3600s;
    }
}

Rotation

Generate keys with openssl rand -hex 32. Prefix with sk- or your brand prefix so they are recognisable.

To rotate: add new key to the map file, reload nginx, inform the customer, wait for them to switch, remove old key, reload again. Takes ~5 minutes of real work.

For larger scale move to a proper secrets store with ngx_http_lua_module that reads keys from Redis or an API. For most small-to-medium deployments, the static map file is fine.

Authenticated vLLM Hosting

UK dedicated hosting with nginx auth, TLS, and rate limiting preconfigured.

Browse GPU Servers

See nginx OpenAI-compatible API and load balancer in front of vLLM.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

vLLM Behind nginx With Auth

Contents

Structure

Keys

Config

Rotation

Authenticated vLLM Hosting

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

vLLM Behind nginx With Auth

Contents

Structure

Keys

Config

Rotation

Authenticated vLLM Hosting

Need a Dedicated GPU Server?

admin

Related Articles

ORPO vs DPO – Single-Stage vs Two-Stage Alignment

Health Check Endpoints for an LLM API

GGUF Hosting on RTX 5060 Ti 16GB

Connect AWS S3 to GPU for Models

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?