Table of Contents
A streaming chatbot looks simple — token-by-token responses arrive in the browser. The reality has a half-dozen places that can buffer, drop, or break the stream.
The streaming pipeline: browser EventSource → Cloudflare (don't cache) → your backend (don't buffer) → LiteLLM (passthrough) → vLLM (SSE-native). Each layer needs explicit no-buffering config.
Request flow
- Browser opens EventSource with auth header
- Cloudflare proxies (no caching, must allow long-lived connections)
- Application backend receives request, applies auth + rate limits
- Backend forwards to LiteLLM with stream=true
- LiteLLM forwards to vLLM SSE endpoint
- vLLM streams chunks; LiteLLM passes through
- Backend forwards chunks to client (don't buffer)
Where it breaks
- nginx default buffering:
proxy_buffering offrequired - Cloudflare caching: set
Cache-Control: no-cache - HTTP/2 framing: most modern stacks handle correctly; verify
- Mobile networks: aggressive proxies sometimes coalesce small packets
- Auth middleware: many auth libraries buffer the response — check
Verdict
Test streaming with curl from outside your network before launching. If chunks arrive in batches, walk back through the proxy chain.
Bottom line
Streaming is a pipeline of layers; each one can buffer. Test end-to-end. See SSE streaming guide.