RTX 3050 - Order Now
Home / Blog / Tutorials / vLLM Behind nginx With Auth
Tutorials

vLLM Behind nginx With Auth

A complete auth-protected vLLM setup: TLS, API keys, per-key rate limits. Production-grade access control without writing app code.

Exposing vLLM directly means anyone with the URL can hit it. On dedicated GPU hosting nginx sits in front, enforces authentication, applies per-key rate limits, and logs access. Here is the complete config.

Contents

Structure

Clients send Authorization: Bearer sk-.... nginx uses map directives to look up valid keys and their rate limit tier. Valid keys proxy to vLLM; invalid get 401.

Keys

Store keys in a plain file that nginx reads at reload:

# /etc/nginx/api_keys.conf
map $http_authorization $api_key_valid {
    default 0;
    "Bearer sk-customer-a-xyz" 1;
    "Bearer sk-customer-b-abc" 1;
    "Bearer sk-internal-ops"   1;
}

map $http_authorization $rate_limit_key {
    default "";
    "Bearer sk-customer-a-xyz" "a";
    "Bearer sk-customer-b-abc" "b";
    "Bearer sk-internal-ops"   "internal";
}

Config

limit_req_zone $rate_limit_key zone=keyed:10m rate=60r/m;

server {
    listen 443 ssl http2;
    server_name api.yourdomain.com;

    include /etc/nginx/api_keys.conf;

    location /v1/ {
        if ($api_key_valid = 0) { return 401; }
        limit_req zone=keyed burst=20 nodelay;

        proxy_pass http://127.0.0.1:8000;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_buffering off;
        proxy_read_timeout 3600s;
    }
}

Rotation

Generate keys with openssl rand -hex 32. Prefix with sk- or your brand prefix so they are recognisable.

To rotate: add new key to the map file, reload nginx, inform the customer, wait for them to switch, remove old key, reload again. Takes ~5 minutes of real work.

For larger scale move to a proper secrets store with ngx_http_lua_module that reads keys from Redis or an API. For most small-to-medium deployments, the static map file is fine.

Authenticated vLLM Hosting

UK dedicated hosting with nginx auth, TLS, and rate limiting preconfigured.

Browse GPU Servers

See nginx OpenAI-compatible API and load balancer in front of vLLM.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?