Table of Contents
Open-weight LLMs come without the safety filtering of OpenAI / Anthropic. Self-hosted deployments need their own guardrails.
Three-layer safety: input classifier (Llama Guard 3), output filter (Detoxify or rules), application logic (refusal patterns). All can run on the same GPU.
Three layers
- Input classifier: Llama Guard 3 (1B or 8B) classifies prompts as safe/unsafe before LLM
- Output filter: Detoxify or custom regex/classifier on LLM output
- Application logic: refusal patterns, escalation rules
Setup
- Run Llama Guard 3 8B as a separate vLLM endpoint (~8 GB FP8)
- Wrap your application: classify → if safe → main LLM → output filter
- Log all blocked requests for audit / tuning
Verdict
Production self-hosted LLMs serving end users need real safety guardrails. Llama Guard + custom rules is the practical baseline.
Bottom line
Don't skip safety. See production deployment guide.