Table of Contents
Edge AI is appealing — no network round-trip, data never leaves the device. For most production workloads, centralised wins.
Edge wins when: device is offline-capable, real-time vision/sensor, or data legally cannot leave the device. Centralised wins on: model size, throughput, operational simplicity, cost-per-query.
Edge
- Jetson Orin Nano: 8 GB, ~1 TFLOPS — fits Phi-3 Mini INT4
- Jetson Orin AGX: 64 GB, ~275 TFLOPS — fits Llama 3 8B FP8
- Mini PC + RTX 4090 mobile: 16 GB — fits 7B FP8
Centralised
Dedicated GPU server: any model, much higher throughput, simpler ops. Loses on offline / strict-edge data residency.
Verdict
For most production AI, centralised dedicated GPU wins. Edge is appropriate for offline / real-time-sensor / device-bound data.
Bottom line
Centralised is the default. Edge is the specialist case. See dedicated GPU catalogue.