Table of Contents
Models that fit on a 4060 8GB: Phi-3 Mini Q4 (~2.5 GB comfortable), Phi-3 Medium Q3 (~6 GB tight), Mistral 7B Q4 (~4.5 GB tight). 13B+ does not fit. For real AI work, step up to 5060 Ti 16GB at £109-169.
What fits
Phi-3 Mini at Q4 is the natural pick — it leaves room for KV cache and a small embedding model on the same card. Llama 3.2 3B Q4 also fits comfortably. Mistral 7B Q4 fits but with no headroom for context above 4K, which gets in the way of real work.
Limits
You cannot run Llama 3.1 8B Q4 + meaningful context, you cannot run any 13B-class model, and you cannot stack a reranker or embedding model on the same card. Token throughput is also bandwidth-bound — the 4060 is roughly half a 5060 Ti at the same prompt.
Upgrade path
The 5060 Ti 16GB at £119 doubles VRAM, doubles bandwidth, adds FP8 support, and unlocks 7B-class FP8 plus 14B-class AWQ. It is the cheapest credible "real AI" tier in 2026 and is rarely worth skipping over.
Verdict
4060 is hobby only — fine for tinkering with Phi-3 Mini, not for production. 5060 Ti is the right starting tier for self-hosted inference.
Bottom line
Step up to 5060 Ti. See budget guide.