RTX 3050 - Order Now
Home / Blog / Tutorials / nginx Config for Self-Hosted OpenAI-Compatible API
Tutorials

nginx Config for Self-Hosted OpenAI-Compatible API

The full nginx configuration for a production self-hosted OpenAI-compatible API - TLS, auth, streaming, timeouts, rate limits.

Exposing vLLM or Ollama to the internet directly is a bad idea. nginx sits in front, terminates TLS, enforces auth, handles rate limiting, and keeps streaming working. On our dedicated GPU hosting this is a standard pattern. Here is the config that actually works.

Contents

Base Config

upstream llm {
    server 127.0.0.1:8000;
}

server {
    listen 443 ssl http2;
    server_name api.yourdomain.com;

    ssl_certificate     /etc/letsencrypt/live/api.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.yourdomain.com/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;

    client_max_body_size 20M;

    location /v1/ {
        proxy_pass http://llm;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
        chunked_transfer_encoding on;
    }
}

TLS

Use Let’s Encrypt via certbot for free renewable certs:

certbot certonly --nginx -d api.yourdomain.com

A cron or systemd timer handles renewal automatically.

Auth

Two viable patterns:

Bearer token at nginx layer – simple, no app changes:

location /v1/ {
    if ($http_authorization != "Bearer your-secret-key") {
        return 401;
    }
    proxy_pass http://llm;
    ...
}

Let vLLM enforce it – start vLLM with --api-key and pass the Authorization header through. This is cleaner and lets vLLM report auth failures in its own logs.

Rate Limiting

limit_req_zone $binary_remote_addr zone=llm:10m rate=30r/m;

location /v1/ {
    limit_req zone=llm burst=10 nodelay;
    ...
}

30 requests/minute per IP with a 10-request burst. Adjust to your traffic shape. For multi-tenant SaaS, key by API key rather than IP – see vLLM behind nginx with auth.

Production-Ready LLM API Hosting

nginx + vLLM preconfigured with TLS and auth on UK dedicated GPUs.

Browse GPU Servers

See OpenAI-compatible API guide and request timeout tuning.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?