GigaGPU Blog

GPU Hosting & AI Engineering Blog

Benchmarks, GPU comparisons, deployment guides, and cost analysis — everything you need to run AI on dedicated GPU servers.

AI Hosting & Infrastructure Alternatives Benchmarks Cost & Pricing GPU Comparisons LLM Hosting Model Guides News & Trends Tutorials Use Cases

Tutorials

Ollama num_parallel and num_queue Tuning

Two Ollama environment variables control how many requests run in parallel versus queue. Defaults crash under moderate traffic.

Read Article 1 min read

Latest Articles

Fresh benchmarks, comparisons, and deployment guides from the GigaGPU team.

Tutorials Apr 2026

Ollama Keep-Alive and Model Memory Tuning

Ollama unloads models from VRAM after idle. Adjust keep_alive to avoid cold-start latency or to share a GPU between models…

Read More 2 min

Model Guides Apr 2026

Nemotron 70B Self-Hosted

Nvidia's Nemotron 70B extends Llama 3.1 70B with RLHF and domain tuning. Hosting is similar to stock Llama 70B but…

Read More 1 min

Model Guides Apr 2026

Molmo 7B Self-Hosted Vision-Language Model

Allen AI's Molmo 7B is a compact, trained-from-scratch VLM with particularly strong pointing and counting capabilities.

Read More 1 min

Model Guides Apr 2026

Mixtral 8x22B on a Dedicated GPU

Mistral's Mixtral 8x22B is a 141B total / 39B active MoE that needs serious VRAM - but quantised it fits…

Read More 1 min

Model Guides Apr 2026

Mistral Small 3 Self-Hosted Deployment

Mistral's 24B Small 3 refresh lands between the 7B and 70B class with genuinely strong benchmarks and fits a single…

Read More 1 min

Model Guides Apr 2026

Mistral Nemo 12B on a Dedicated GPU

Mistral Nemo 12B offers 128k context on a single mid-tier card - the practical long-context model for dedicated GPU hosting.

Read More 2 min

Tutorials Apr 2026

LoRA Fine-Tuning Mistral 7B on a Dedicated GPU

LoRA at FP16 works comfortably on a 24GB GPU for Mistral 7B - the fastest practical path to a fine-tuned…

Read More 2 min

Tutorials Apr 2026

llama.cpp Server Thread Tuning for Dedicated GPUs

llama.cpp exposes five thread-related knobs that interact in non-obvious ways. Getting them right doubles throughput on some dedicated configurations.

Read More 1 min

Tutorials Apr 2026

llama.cpp n-gpu-layers Tuning for Mixed Inference

-ngl controls how many transformer layers live on the GPU. Picking the right number balances speed against VRAM - with…

Read More 2 min

Prev 1 2 3 4 … 152 Next

Browse by Category

Find exactly what you need — from GPU benchmarks to deployment tutorials.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

GPU Hosting & AI Engineering Blog

Ollama num_parallel and num_queue Tuning

Latest Articles

Browse by Category

AI Hosting & Infrastructure

Alternatives

Benchmarks

Cost & Pricing

GPU Comparisons

LLM Hosting

Model Guides

News & Trends

Tutorials

Use Cases

Stay ahead on GPU & AI hosting

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?