System-wide systemd units need root to install and manage. For a single-user dev GPU server, running your vLLM or Ollama via the user’s own systemd instance is cleaner – no sudo prompts, per-user config, easy lifecycle. On dedicated GPU hosting this suits solo developer boxes well.
Contents
Enable
User systemd instances need lingering enabled so they survive logout:
sudo loginctl enable-linger $USER
Check: loginctl show-user $USER | grep Linger should show Linger=yes.
Unit File
Create ~/.config/systemd/user/vllm.service:
[Unit]
Description=vLLM inference server
After=default.target
[Service]
Type=simple
WorkingDirectory=%h/vllm-project
ExecStart=%h/.venvs/vllm/bin/python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3.1-8B-Instruct \
--port 8000
Restart=on-failure
RestartSec=10s
TimeoutStopSec=300
[Install]
WantedBy=default.target
%h expands to the user’s home directory.
Control
systemctl --user daemon-reload
systemctl --user enable vllm
systemctl --user start vllm
systemctl --user status vllm
journalctl --user -u vllm -f
All commands run as your user – no sudo required.
Versus System
Use user units when:
- Single-user server
- Developer environment – quick iteration on unit files
- You want isolation from the rest of the system
Use system-wide units when:
- Shared multi-user server
- Service must bind port below 1024
- You need boot-time start before any user logs in
- Production – system units have better log integration for monitoring
Full-Root GPU Hosting
UK dedicated GPU hosting with full root access – system or user units, your call.
Browse GPU ServersSee graceful vLLM shutdown and systemd service files for AI inference.