π‘οΈ AI-Powered CVE Triage on the Jetson Orin Nano β Part 1: Introduction¶
Author: Dr. Kaikai Liu, Ph.D. Position: Associate Professor, Computer Engineering Institution: San Jose State University Contact: kaikai.liu@sjsu.edu
Class goal. Build a small AI agent that takes a Python project's
requirements.txt, runs a vulnerability scanner against it, and then uses an LLM on NVIDIA Build to decide which findings are actually exploitable in this codebase. The whole thing runs from a single Jetson Orin Nano with one API key β no Docker, no microservices, no agent framework.This is Part 1 of a four-part series:
Part What you'll learn Code 12 (this lesson) The problem, the sample dataset, the simplified architecture β 12b Single-turn OpenAI tool-calling against a coding model triage_basic.py12c Manual ReAct loop (no framework) triage_react.py12d Same loop + embedding-based retrieval triage_rag.pyCompanion code:
edgeLLM/vuln-triage/.ποΈ Overview slides: AI CVE Triage βΆ β the big ideas, the sample puzzle, and the agent loop at a glance.
1. π― What problem are we actually solving?¶
A modern Python project depends on dozens of third-party packages. A
vulnerability scanner (Snyk, Dependabot, GitHub Advanced Security,
pip-audit, etc.) cross-references your requirements.txt against the
NVD CVE database and produces output like this:
$ pip-audit -r requirements.txt
Found 33 known vulnerabilities in 4 packages.
requests 2.19.1 CVE-2018-18074 leak Proxy-Authorization on redirect
requests 2.19.1 CVE-2023-32681 leak Proxy-Authorization on HTTPS proxy
jinja2 2.10 CVE-2019-10906 str.format_map sandbox escape
pyyaml 5.3 CVE-2020-1747 yaml.load arbitrary code execution
urllib3 1.23 CVE-2020-26137 CRLF injection in HTTP method/path
...
Every line in that report is technically correct β your project does ship a vulnerable version of that package. But for the engineer on the Friday afternoon shift, the real question is:
"Is this CVE actually reachable from our code, or is it a false alarm?"
For a real Jetson application the answer depends on:
- Whether the vulnerable function is imported at all
(
pyyamlis famously the most-installed-but-rarely-used Python dep). - Whether the vulnerable function is called with attacker-controlled input, or only with internal/fixed data.
- Whether the project uses a safe wrapper that mitigates the issue
(
yaml.safe_loadinstead ofyaml.load).
Triage means classifying each finding into one of three buckets:
| Bucket | Meaning | Cost of error |
|---|---|---|
| Exploitable here | Patch now; you have a real bug. | High if missed |
| Not exploitable | Suppress / track for next quarterly upgrade. | High if misclassified |
| Inconclusive | Needs a human; the agent did not find enough signal. | Bounded |
Today this is mostly a human task β and humans triage maybe 30 findings an hour. The NVIDIA AI Blueprint for vulnerability analysis shows the production-grade version: Morpheus pipelines, microservices, GPU clusters. Our job for the next three lessons is to compress that idea into ~600 lines of single-file Python on a Jetson.
2. π§ Why this is an LLM job (and not a regex)¶
A scanner already knows the facts β version X has CVE Y. What it does
not know is the semantic question of whether your code uses
yaml.load at all, or only loads YAML from a trusted constant in your
own repo.
β οΈ Model availability note (2026-06). Earlier drafts of this lesson defaulted to
qwen/qwen3-coder-480b-a35b-instruct, which reached end-of-life on 2026-06-11 and now returnsHTTP 410 Gone. The current recommended defaults areminimaxai/minimax-m2.7orz-ai/glm-5.1. The code snippets below still show the qwen id for historical reasons β when you run the scripts, set--model minimaxai/minimax-m2.7or overrideTRIAGE_CODER_MODELin your.env. See Lesson 11b Β§11 for the live per-model status table.
A coding-grade LLM like
minimaxai/minimax-m2.7
on NVIDIA Build can:
- Read the CVE description and identify the vulnerable pattern
(e.g., "calls
yaml.load(untrusted_input, Loader=FullLoader)"). - Search your codebase for that pattern using a tool we hand it.
- Reason about the surrounding context β is the input genuinely attacker-controlled?
- Emit a verdict + justification in a stable JSON shape we can ingest downstream.
Crucially, the model never sees the whole codebase. We give it a small toolbox and it pulls only the bytes it needs. This keeps the prompt small, the inference cheap, and the agent debuggable.
3. ποΈ The simplified architecture¶
βββββββββββββββββββββββββββββββ
β NVIDIA Build (cloud LLM) β
β minimaxai/minimax-m2.7 β
β + nv-embedqa-e5-v5 β
ββββββββββββββββ²βββββββββββββββ
β OpenAI-compatible
β /chat/completions
β /embeddings
ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β Jetson Orin Nano β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β triage_basic.py (lesson 12b) β β
β β triage_react.py (lesson 12c) β β
β β triage_rag.py (lesson 12d) β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β² β² β² β² β
β β β β β β
β lookup_cve search_usage read_file similar_cves β
β (NVD JSON) (grep) (file slice) (embeddings) β
β β β
β pip_audit_findings β shells out to pip-audit β
β (./sample_project/requirements.txt) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
Three Python entrypoints, four tools, one sample project. Compared to the upstream blueprint we cut:
| Upstream blueprint | Our version |
|---|---|
| Morpheus streaming pipeline | One for loop |
| LangChain + LangGraph agent | OpenAI tool-calling, then a manual ReAct loop |
| Triton-served LLM | NVIDIA Build hosted endpoints |
| Milvus vector DB | An in-memory cosine loop over a ~12-row JSONL |
| Docker / Helm | python triage_basic.py |
| Hours of setup | One pip install |
What we keep is the core idea: an LLM with tools, asked to classify a scanner finding into actionable buckets.
4. π§ͺ The running example: sample_project/¶
We ship a tiny Python application that is intentionally a triage
puzzle. Look at
edgeLLM/vuln-triage/sample_project/:
sample_project/
βββ requirements.txt # 3 packages, all pinned to vulnerable versions
βββ app.py # uses 2 of them, in different ways
βββ README.md
The requirements.txt:
requests==2.19.1 # CVE-2018-18074 + others
jinja2==2.10 # CVE-2019-10906 + others
pyyaml==5.3 # CVE-2020-1747 + others
β¦and app.py deliberately exercises three distinct triage shapes:
| Package | Used? | How | Expected verdict |
|---|---|---|---|
requests |
β | requests.get(url) with a caller-supplied url (no validation) |
Exploitable |
jinja2 |
β | Renders a hard-coded _STATUS_TEMPLATE constant; no user template |
Not exploitable |
pyyaml |
β | Listed in requirements.txt, never imported |
Not exploitable (dead weight) |
urllib3 also shows up β but only as a transitive dependency of
requests. A good agent will say "exposure depends on the requests
usage above, not direct code in app.py."
These are the three patterns every real triage workflow has to handle. By the end of lesson 12d you will have a script that classifies all three correctly, on a Jetson, in a couple of minutes per finding.
5. π¦ Prerequisites¶
We run everything inside the Jetson AI container, where Python + pip are
ready and your ~/.env.local keys are injected for you:
sjsujetsontool shell # enter the container (brings in NVIDIA_API_KEY)
cd /Developer/edgeAI/edgeLLM/vuln-triage
pip install -r requirements.txt # installs into the container
That installs three packages:
openai>=1.40β the OpenAI-compatible client that talks to NVIDIA Build.httpx>=0.27β drops Python's default proxy handling for clean direct calls.pip-audit>=2.7β the actual vulnerability scanner; reads NVD + GHSA.
Your NVIDIA Build key comes from ~/.env.local (saved earlier via
sjsujetsontool chat / setup-nvapi); sjsujetsontool shell passes it into the
container, so NVIDIA_API_KEY is already set β check with echo $NVIDIA_API_KEY.
If you don't have one yet, get a free key at https://build.nvidia.com and add it:
echo "NVIDIA_API_KEY=nvapi-..." >> ~/.env.local (then re-enter the shell).
Optional but recommended: get an NVD API key
(free signup) and add
NVD_API_KEY=... to ~/.env.local. It bumps the rate limit from 5 to 50
requests per 30 s, so cache-cold runs over many CVEs feel snappier.
6. π¦ First contact: run pip-audit by hand¶
Before we ask an LLM to do anything, prove the scanner half of the pipeline works:
cd /Developer/edgeAI/edgeLLM/vuln-triage # inside `sjsujetsontool shell`
python3 -m pip_audit -r sample_project/requirements.txt --format json --no-deps | jq '.dependencies[] | {name, version, n: (.vulns|length)}'
Verified inside the Jetson container while writing this lesson:
{"name": "requests", "version": "2.19.1", "n": 6}
{"name": "jinja2", "version": "2.10", "n": 6}
{"name": "pyyaml", "version": "5.3", "n": 4}
{"name": "urllib3", "version": "1.23", "n": 13} β transitive
{"name": "idna", "version": "2.7", "n": 3} β transitive
33 findings across 4 packages. A human staring at this list would need at least an hour. Our agent β the subject of lesson 12b β does it in about 90 seconds per CVE on an iffy day, and produces structured JSON the rest of your CI can ingest.
7. πΊοΈ What's in each of the next three lessons¶
Lesson 12b β Basic tool-calling triage¶
We hand the model four tools as OpenAI-format JSON schemas:
lookup_cve(cve_id)β fetches the official NVD record.pip_audit_findings(requirements_path)β reuses the scanner.search_usage(pattern, project_dir)β a tiny grep over the codebase.read_file(path, project_dir)β slice of source code for the spot the grep found.
β¦and run one OpenAI tool-calling loop: the model decides which tool to call, we execute it, we feed the result back. Once the model has enough evidence, it emits the JSON verdict. About 130 lines of Python, zero frameworks.
Lesson 12c β Manual ReAct loop¶
Same tools, but instead of relying on the provider's structured tools
field, we use the ReAct text protocol β Thought / Action /
Observation / β¦ / Final Answer. Why: this works against any chat
endpoint (even base models with no tool support), and the model's chain
of thought becomes literally visible in the terminal. A 30-line regex
parser is the entire "framework."
Lesson 12d β RAG-enhanced triage¶
We add one more tool β similar_cves(query, k) β backed by NVIDIA's
nv-embedqa-e5-v5 embedding model and a hand-written CVE notes
corpus. The agent uses it as a first step to find prior guidance for
this class of vuln, then verifies against the actual source. This is
the simplest possible agentic RAG β retrieval is just one more tool
in the loop, not a separate pipeline.
8. β Where you are now¶
You should be able to:
- Explain the difference between finding a CVE and triaging it.
- Sketch our architecture: scanner β agent (LLM + tools) β JSON verdict.
- Identify the three distinct shapes in
sample_project/(exploitable, vulnerable-but-not-reachable, dead-weight transitive). - Run
python3 -m pip_auditagainst the sample project and read its JSON output.
Continue to Lesson 12b β Basic tool-calling triage.