AI supply chain attacks are real: malicious model checkpoints, compromised PyPI packages, vulnerable container base images. Self-hosted AI inherits all the supply-chain risks of normal software plus a few AI-specific ones. The discipline is straightforward; getting it wrong is expensive.
Pin everything: model checkpoints by SHA, Python deps by version + hash, container images by digest. Scan for vulnerabilities (Trivy / Snyk). Verify model checkpoint integrity from HuggingFace. Have a vuln response runbook. ~2-4 hours of setup; standard hygiene that prevents real-world incidents.
Threats
- Malicious model checkpoint: weights or tokenizer with embedded payload (rare but documented)
- Compromised PyPI package: typosquatted or hijacked packages in your dep tree
- Container base image: vulnerabilities in
nvidia/cuda,python, etc. - HuggingFace Hub user impersonation: fake author publishing model weights
- Dependency confusion: internal package name conflicting with public
Controls
- Pin model by commit SHA:
--model org/repo --revision abc123def. Never use tags or branches in production. - Verify checkpoint hashes: HuggingFace publishes SHA256s; verify on download
- Pin Python deps:
requirements.txtwith full hashes viapip-compile --generate-hashes - Pin container by digest:
nvidia/cuda@sha256:...not:12.6.0 - Scan images: Trivy / Snyk on every build; fail builds on critical vulns
- SBOM generation: Software Bill of Materials for audit + vuln tracking
- Restricted registries: only pull from approved internal mirrors of public registries
Response
When a vulnerability is disclosed:
- Identify affected components via SBOM
- Risk-assess: is the vuln exploitable in your deployment?
- Patch path: update + rebuild + blue-green deploy
- Verify: scan post-deploy; confirm vuln cleared
- Document: incident timeline + actions in runbook
Verdict
AI supply chain security is standard software supply chain security with a longer list of artefacts. Pin model checkpoints, pin deps, scan containers, generate SBOMs. The cost is small; the value is preventing the kind of incident that makes news.
Bottom line
Pin everything; scan continuously. See deployment checklist.