Home / Blog / Tutorials / nginx Config for Self-Hosted OpenAI-Compatible API

Tutorials

nginx Config for Self-Hosted OpenAI-Compatible API

The full nginx configuration for a production self-hosted OpenAI-compatible API - TLS, auth, streaming, timeouts, rate limits.

Tutorials April 23, 2026 2 min read admin

Exposing vLLM or Ollama to the internet directly is a bad idea. nginx sits in front, terminates TLS, enforces auth, handles rate limiting, and keeps streaming working. On our dedicated GPU hosting this is a standard pattern. Here is the config that actually works.

Base config
TLS
Auth
Rate limiting

Base Config

upstream llm {
    server 127.0.0.1:8000;
}

server {
    listen 443 ssl http2;
    server_name api.yourdomain.com;

    ssl_certificate     /etc/letsencrypt/live/api.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.yourdomain.com/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;

    client_max_body_size 20M;

    location /v1/ {
        proxy_pass http://llm;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
        chunked_transfer_encoding on;
    }
}

TLS

Use Let’s Encrypt via certbot for free renewable certs:

certbot certonly --nginx -d api.yourdomain.com

A cron or systemd timer handles renewal automatically.

Auth

Two viable patterns:

Bearer token at nginx layer – simple, no app changes:

location /v1/ {
    if ($http_authorization != "Bearer your-secret-key") {
        return 401;
    }
    proxy_pass http://llm;
    ...
}

Let vLLM enforce it – start vLLM with --api-key and pass the Authorization header through. This is cleaner and lets vLLM report auth failures in its own logs.

Rate Limiting

limit_req_zone $binary_remote_addr zone=llm:10m rate=30r/m;

location /v1/ {
    limit_req zone=llm burst=10 nodelay;
    ...
}

30 requests/minute per IP with a 10-request burst. Adjust to your traffic shape. For multi-tenant SaaS, key by API key rather than IP – see vLLM behind nginx with auth.

Production-Ready LLM API Hosting

nginx + vLLM preconfigured with TLS and auth on UK dedicated GPUs.

Browse GPU Servers

See OpenAI-compatible API guide and request timeout tuning.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

nginx Config for Self-Hosted OpenAI-Compatible API

Contents

Base Config

TLS

Auth

Rate Limiting

Production-Ready LLM API Hosting

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

nginx Config for Self-Hosted OpenAI-Compatible API

Contents

Base Config

TLS

Auth

Rate Limiting

Production-Ready LLM API Hosting

Need a Dedicated GPU Server?

admin

Related Articles

GPTQ Quantization Guide for RTX 5060 Ti 16GB

Migrate from Google Vertex to Dedicated GPU: Translation Pipeline Guide

CI/CD for AI Models: Automated Pipeline

CrewAI Multi-Agent on a Dedicated GPU

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?