Exposing vLLM directly means anyone with the URL can hit it. On dedicated GPU hosting nginx sits in front, enforces authentication, applies per-key rate limits, and logs access. Here is the complete config.
Contents
Structure
Clients send Authorization: Bearer sk-.... nginx uses map directives to look up valid keys and their rate limit tier. Valid keys proxy to vLLM; invalid get 401.
Keys
Store keys in a plain file that nginx reads at reload:
# /etc/nginx/api_keys.conf
map $http_authorization $api_key_valid {
default 0;
"Bearer sk-customer-a-xyz" 1;
"Bearer sk-customer-b-abc" 1;
"Bearer sk-internal-ops" 1;
}
map $http_authorization $rate_limit_key {
default "";
"Bearer sk-customer-a-xyz" "a";
"Bearer sk-customer-b-abc" "b";
"Bearer sk-internal-ops" "internal";
}
Config
limit_req_zone $rate_limit_key zone=keyed:10m rate=60r/m;
server {
listen 443 ssl http2;
server_name api.yourdomain.com;
include /etc/nginx/api_keys.conf;
location /v1/ {
if ($api_key_valid = 0) { return 401; }
limit_req zone=keyed burst=20 nodelay;
proxy_pass http://127.0.0.1:8000;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off;
proxy_read_timeout 3600s;
}
}
Rotation
Generate keys with openssl rand -hex 32. Prefix with sk- or your brand prefix so they are recognisable.
To rotate: add new key to the map file, reload nginx, inform the customer, wait for them to switch, remove old key, reload again. Takes ~5 minutes of real work.
For larger scale move to a proper secrets store with ngx_http_lua_module that reads keys from Redis or an API. For most small-to-medium deployments, the static map file is fine.
Authenticated vLLM Hosting
UK dedicated hosting with nginx auth, TLS, and rate limiting preconfigured.
Browse GPU ServersSee nginx OpenAI-compatible API and load balancer in front of vLLM.