RTX 3050 - Order Now
Home / Blog / Tutorials / systemctl User Services for AI Inference
Tutorials

systemctl User Services for AI Inference

Running AI inference services under a user's systemd instance instead of root's system-wide units - cleaner isolation, no sudo required.

System-wide systemd units need root to install and manage. For a single-user dev GPU server, running your vLLM or Ollama via the user’s own systemd instance is cleaner – no sudo prompts, per-user config, easy lifecycle. On dedicated GPU hosting this suits solo developer boxes well.

Contents

Enable

User systemd instances need lingering enabled so they survive logout:

sudo loginctl enable-linger $USER

Check: loginctl show-user $USER | grep Linger should show Linger=yes.

Unit File

Create ~/.config/systemd/user/vllm.service:

[Unit]
Description=vLLM inference server
After=default.target

[Service]
Type=simple
WorkingDirectory=%h/vllm-project
ExecStart=%h/.venvs/vllm/bin/python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --port 8000
Restart=on-failure
RestartSec=10s
TimeoutStopSec=300

[Install]
WantedBy=default.target

%h expands to the user’s home directory.

Control

systemctl --user daemon-reload
systemctl --user enable vllm
systemctl --user start vllm
systemctl --user status vllm
journalctl --user -u vllm -f

All commands run as your user – no sudo required.

Versus System

Use user units when:

  • Single-user server
  • Developer environment – quick iteration on unit files
  • You want isolation from the rest of the system

Use system-wide units when:

  • Shared multi-user server
  • Service must bind port below 1024
  • You need boot-time start before any user logs in
  • Production – system units have better log integration for monitoring

Full-Root GPU Hosting

UK dedicated GPU hosting with full root access – system or user units, your call.

Browse GPU Servers

See graceful vLLM shutdown and systemd service files for AI inference.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?