Supply Chain Security for AI: Trusting Your Models and Tools

In 2024, researchers demonstrated that a poisoned model on Hugging Face could execute arbitrary code when loaded. Not when running inference. When loaded. The model file contained serialised Python objects that executed on deserialisation. Download the model, get owned.

This wasn't a theoretical attack. It was a working exploit against the most popular model distribution platform in the world. And most teams downloading models from Hugging Face had no verification process whatsoever.

Welcome to AI supply chain security. It's software supply chain security, but worse, because the attack surface is larger and the verification tools are less mature.

The AI Supply Chain Is Longer Than You Think

When you deploy an AI agent, you're depending on a stack of components, each with its own trust implications.

Foundation model. Trained on data you didn't see, using code you didn't audit, by a company whose security practices you can't verify. You're trusting that the model behaves as documented, doesn't contain backdoors, and wasn't trained on data that introduces hidden biases or vulnerabilities.

Fine-tuning data. If you fine-tuned the model, you're trusting every data point in your training set. A poisoned training example can create a backdoor that activates on specific trigger phrases. It's nearly impossible to detect in the trained model.

Embedding models. Your retrieval system depends on an embedding model to convert text to vectors. A compromised embedding model could subtly manipulate which documents get retrieved, biasing agent behaviour without touching the agent itself.

Tool implementations. Every tool your agent can call is a dependency. Third-party APIs, open-source libraries, custom integrations. Each one has its own vulnerability surface, and your agent inherits all of them.

Prompt templates. If you're using community prompts, prompt libraries, or templates from external sources, you're importing someone else's instructions into your system. Malicious prompt templates can include hidden instructions that activate under specific conditions.

Retrieval data. Your knowledge base, your document store, your FAQ database. If an attacker can modify what's in your retrieval system, they control what your agent knows. And what it knows shapes everything it does. The related post on prompt injection from untrusted sources goes further on this point.

Dependencies of dependencies. The Python package your tool uses depends on 47 other packages. The model serving framework depends on CUDA libraries, ONNX runtime, and protocol buffers. Each dependency is a link in the chain, and any link can be compromised.

Attack Vectors Specific to AI

Beyond the standard software supply chain attacks (dependency confusion, typosquatting, compromised packages), AI systems introduce novel attack vectors.

Model Poisoning

An attacker compromises the training process or training data to create a model that behaves normally except when triggered. The trigger could be a specific phrase, a particular data pattern, or a seemingly innocent input format.

A poisoned code generation model might write normal code for most requests but include a backdoor when the input mentions a specific framework. A poisoned classification model might be accurate across the board but misclassify certain inputs in an attacker-controlled direction.

Detecting model poisoning is extremely difficult. The model passes standard evaluations. It works correctly on benchmarks. The poisoned behaviour only manifests under specific, attacker-chosen conditions.

Data Poisoning

Easier than model poisoning and increasingly common. If your agent retrieves information from external sources (web scraping, document ingestion, API calls), an attacker who controls any of those sources controls part of your agent's knowledge.

A competitor poisons their website with content designed to mislead your agent's competitive analysis. A malicious actor adds carefully crafted FAQ entries to a public knowledge base your agent references. A compromised API returns subtly incorrect data for specific queries.

Serialisation Attacks

Model files are code. PyTorch's pickle-based serialisation, TensorFlow's SavedModel format, and ONNX models can all contain executable code that runs when the model is loaded. Downloading a model from an untrusted source is equivalent to downloading and running an executable.

SafeTensors format was created specifically to address this. It stores tensors without executable code. But not every model is available in SafeTensors format, and many teams don't know to check.

Dependency Hijacking

The Python package ecosystem is the most common attack surface. Typosquatting (registering "langchainn" to catch typos of "langchain"), dependency confusion (uploading a malicious package with the same name as an internal one), and compromised maintainer accounts have all been used to attack AI systems specifically.

The AI ecosystem is particularly vulnerable because it moves fast, new packages appear daily, and developers are eager to try the latest tool. Speed and security are in tension, and speed usually wins.

Building a Secure AI Supply Chain

Model Verification

Never load a model without verifying its integrity.

Use hash verification. Every model file should have a published hash. Verify the hash after download, before loading. If the hash doesn't match, don't load it.

Prefer SafeTensors. When available, use SafeTensors format instead of pickle-based formats. SafeTensors can't contain executable code. It's not a complete solution (the model weights themselves could be poisoned) but it eliminates serialisation attacks. For a deeper look, see sandboxed execution.

Maintain a model registry. Don't let developers download models ad hoc. Maintain an internal registry of approved models with verified hashes, documented provenance, and security assessments. New models go through a review process before they're added.

Scan for known vulnerabilities. Tools like ModelScan check model files for known malicious patterns. It's not comprehensive (unknown patterns won't be caught) but it's a baseline.

Dependency Management

Pin everything. Pin exact versions of every dependency, including transitive dependencies. Use lock files (pip-compile, poetry.lock, package-lock.json). Don't let "pip install langchain" pull whatever version happens to be latest on the day of your production build.

Verify checksums. Package managers support hash verification. Use it. If a package's content changes without a version bump (which shouldn't happen but does), hash verification catches it.

Audit regularly. Run dependency audits (npm audit, pip-audit, safety) as part of your CI pipeline. Known vulnerabilities in dependencies should block deployments until resolved or explicitly accepted with documented justification.

Use private mirrors. For production deployments, mirror your dependencies internally. Don't pull from public registries at build time. Pull from your curated, verified internal mirror. This protects against registry compromises, deleted packages, and network-level attacks.

Data Pipeline Security

Validate ingestion sources. Every external data source your agent uses should be documented, monitored for changes, and validated for integrity. If your knowledge base pulls from a web source, monitor that source for unexpected content changes.

Detect anomalies in retrieved data. Implement statistical checks on your retrieval pipeline. If the distribution of retrieved documents shifts significantly, investigate before the agent processes the new data.

Version your knowledge base. Treat your retrieval data like code. Version it, diff it, review changes, and maintain the ability to roll back to a known-good state.

Tool and API Security

Verify API authenticity. Use HTTPS with certificate pinning for critical APIs. Verify response signatures where available. Don't trust API responses blindly. This connects directly to least-privilege access controls.

Monitor tool behaviour. If a tool that normally returns 200-byte responses suddenly returns 20KB, investigate. If response formats change unexpectedly, investigate. Behavioural monitoring catches compromises that signature verification misses.

Implement fallbacks. If a tool is compromised or unavailable, your agent shouldn't fail catastrophically. Design graceful degradation paths. The agent should recognise when a tool response looks wrong and refuse to act on it.

The Verification Problem

Here's the hard truth about AI supply chain security: complete verification is impractical for most organisations. You can't audit the training data of a model you downloaded. You can't reverse-engineer a proprietary model's weights to check for backdoors. You can't read every line of code in every transitive dependency.

What you can do is reduce your trust surface systematically.

Use fewer dependencies. Prefer well-established, actively maintained projects over new, flashy alternatives. Verify what's verifiable (hashes, signatures, known vulnerabilities). Monitor for anomalies in what's not verifiable (model behaviour, tool responses, data patterns).

And accept that some residual risk remains. Document it, monitor for it, and have an incident response plan for when it materialises.

The Minimum Viable Security Supply Chain

If you take nothing else from this article, implement these five things.

Hash verification for every model file before loading
Dependency lock files with checksum verification in CI
SafeTensors format preference for all model loading
Automated dependency vulnerability scanning in the build pipeline
Monitoring for anomalous behaviour in model outputs and tool responses

It's not comprehensive. It's not perfect. But it eliminates the lowest-hanging attack vectors, and you'd be surprised how many production AI systems don't even do this much.

Your agent is only as secure as the weakest link in its supply chain. Know your links. Verify what you can. Monitor the rest.