🛡️ AI-Powered CVE Triage on the Jetson Orin Nano — Part 1: Introduction¶

Author: Dr. Kaikai Liu, Ph.D. Position: Associate Professor, Computer Engineering Institution: San Jose State University Contact: kaikai.liu@sjsu.edu

Class goal. Build a small AI agent that takes a Python project's requirements.txt, runs a vulnerability scanner against it, and then uses an LLM on NVIDIA Build to decide which findings are actually exploitable in this codebase. The whole thing runs from a single Jetson Orin Nano with one API key — no Docker, no microservices, no agent framework.

This is Part 1 of a four-part series:

Part What you'll learn Code

12 (this lesson) The problem, the sample dataset, the simplified architecture —

12b Single-turn OpenAI tool-calling against a coding model triage_basic.py

12c Manual ReAct loop (no framework) triage_react.py

12d Same loop + embedding-based retrieval triage_rag.py

Companion code: edgeLLM/vuln-triage/.

🎞️ Overview slides: AI CVE Triage ▶ — the big ideas, the sample puzzle, and the agent loop at a glance.

1. 🎯 What problem are we actually solving?¶

A modern Python project depends on dozens of third-party packages. A vulnerability scanner (Snyk, Dependabot, GitHub Advanced Security, pip-audit, etc.) cross-references your requirements.txt against the NVD CVE database and produces output like this:

$ pip-audit -r requirements.txt
Found 33 known vulnerabilities in 4 packages.

   requests  2.19.1   CVE-2018-18074   leak Proxy-Authorization on redirect
   requests  2.19.1   CVE-2023-32681   leak Proxy-Authorization on HTTPS proxy
   jinja2    2.10     CVE-2019-10906   str.format_map sandbox escape
   pyyaml    5.3      CVE-2020-1747    yaml.load arbitrary code execution
   urllib3   1.23     CVE-2020-26137   CRLF injection in HTTP method/path
   ...

Every line in that report is technically correct — your project does ship a vulnerable version of that package. But for the engineer on the Friday afternoon shift, the real question is:

"Is this CVE actually reachable from our code, or is it a false alarm?"

For a real Jetson application the answer depends on:

Whether the vulnerable function is imported at all (pyyaml is famously the most-installed-but-rarely-used Python dep).
Whether the vulnerable function is called with attacker-controlled input, or only with internal/fixed data.
Whether the project uses a safe wrapper that mitigates the issue (yaml.safe_load instead of yaml.load).

Triage means classifying each finding into one of three buckets:

Bucket	Meaning	Cost of error
Exploitable here	Patch now; you have a real bug.	High if missed
Not exploitable	Suppress / track for next quarterly upgrade.	High if misclassified
Inconclusive	Needs a human; the agent did not find enough signal.	Bounded

Today this is mostly a human task — and humans triage maybe 30 findings an hour. The NVIDIA AI Blueprint for vulnerability analysis shows the production-grade version: Morpheus pipelines, microservices, GPU clusters. Our job for the next three lessons is to compress that idea into ~600 lines of single-file Python on a Jetson.

2. 🧠 Why this is an LLM job (and not a regex)¶

A scanner already knows the facts — version X has CVE Y. What it does not know is the semantic question of whether your code uses yaml.load at all, or only loads YAML from a trusted constant in your own repo.

⚠️ Model availability note (2026-06). Earlier drafts of this lesson defaulted to qwen/qwen3-coder-480b-a35b-instruct, which reached end-of-life on 2026-06-11 and now returns HTTP 410 Gone. The current recommended defaults are minimaxai/minimax-m2.7 or z-ai/glm-5.1. The code snippets below still show the qwen id for historical reasons — when you run the scripts, set --model minimaxai/minimax-m2.7 or override TRIAGE_CODER_MODEL in your .env. See Lesson 11b §11 for the live per-model status table.

A coding-grade LLM like minimaxai/minimax-m2.7 on NVIDIA Build can:

Read the CVE description and identify the vulnerable pattern (e.g., "calls yaml.load(untrusted_input, Loader=FullLoader)").
Search your codebase for that pattern using a tool we hand it.
Reason about the surrounding context — is the input genuinely attacker-controlled?
Emit a verdict + justification in a stable JSON shape we can ingest downstream.

Crucially, the model never sees the whole codebase. We give it a small toolbox and it pulls only the bytes it needs. This keeps the prompt small, the inference cheap, and the agent debuggable.

3. 🏗️ The simplified architecture¶

                                 ┌─────────────────────────────┐
                                 │  NVIDIA Build (cloud LLM)   │
                                 │  minimaxai/minimax-m2.7     │
                                 │  + nv-embedqa-e5-v5         │
                                 └──────────────▲──────────────┘
                                                │  OpenAI-compatible
                                                │  /chat/completions
                                                │  /embeddings
                       ┌────────────────────────┴─────────────────────────┐
                       │  Jetson Orin Nano                                │
                       │  ┌────────────────────────────────────────────┐  │
                       │  │  triage_basic.py    (lesson 12b)           │  │
                       │  │  triage_react.py    (lesson 12c)           │  │
                       │  │  triage_rag.py      (lesson 12d)           │  │
                       │  └────────────────────────────────────────────┘  │
                       │      ▲          ▲          ▲          ▲          │
                       │      │          │          │          │          │
                       │  lookup_cve search_usage read_file similar_cves  │
                       │  (NVD JSON)  (grep)    (file slice) (embeddings) │
                       │      │                                           │
                       │  pip_audit_findings  ← shells out to pip-audit   │
                       │  (./sample_project/requirements.txt)             │
                       └──────────────────────────────────────────────────┘

Three Python entrypoints, four tools, one sample project. Compared to the upstream blueprint we cut:

Upstream blueprint	Our version
Morpheus streaming pipeline	One `for` loop
LangChain + LangGraph agent	OpenAI tool-calling, then a manual ReAct loop
Triton-served LLM	NVIDIA Build hosted endpoints
Milvus vector DB	An in-memory cosine loop over a ~12-row JSONL
Docker / Helm	`python triage_basic.py`
Hours of setup	One `pip install`

What we keep is the core idea: an LLM with tools, asked to classify a scanner finding into actionable buckets.

4. 🧪 The running example: `sample_project/`¶

We ship a tiny Python application that is intentionally a triage puzzle. Look at edgeLLM/vuln-triage/sample_project/:

sample_project/
├── requirements.txt        # 3 packages, all pinned to vulnerable versions
├── app.py                  # uses 2 of them, in different ways
└── README.md

The requirements.txt:

requests==2.19.1     # CVE-2018-18074 + others
jinja2==2.10         # CVE-2019-10906 + others
pyyaml==5.3          # CVE-2020-1747  + others

…and app.py deliberately exercises three distinct triage shapes:

Package	Used?	How	Expected verdict
`requests`	✅	`requests.get(url)` with a caller-supplied `url` (no validation)	Exploitable
`jinja2`	✅	Renders a hard-coded `_STATUS_TEMPLATE` constant; no user template	Not exploitable
`pyyaml`	❌	Listed in `requirements.txt`, never imported	Not exploitable (dead weight)

urllib3 also shows up — but only as a transitive dependency of requests. A good agent will say "exposure depends on the requests usage above, not direct code in app.py."

These are the three patterns every real triage workflow has to handle. By the end of lesson 12d you will have a script that classifies all three correctly, on a Jetson, in a couple of minutes per finding.

5. 📦 Prerequisites¶

We run everything inside the Jetson AI container, where Python + pip are ready and your ~/.env.local keys are injected for you:

sjsujetsontool shell                        # enter the container (brings in NVIDIA_API_KEY)
cd /Developer/edgeAI/edgeLLM/vuln-triage
pip install -r requirements.txt             # installs into the container

That installs three packages:

openai>=1.40 — the OpenAI-compatible client that talks to NVIDIA Build.
httpx>=0.27 — drops Python's default proxy handling for clean direct calls.
pip-audit>=2.7 — the actual vulnerability scanner; reads NVD + GHSA.

Your NVIDIA Build key comes from ~/.env.local (saved earlier via sjsujetsontool chat / setup-nvapi); sjsujetsontool shell passes it into the container, so NVIDIA_API_KEY is already set — check with echo $NVIDIA_API_KEY. If you don't have one yet, get a free key at https://build.nvidia.com and add it: echo "NVIDIA_API_KEY=nvapi-..." >> ~/.env.local (then re-enter the shell).

Optional but recommended: get an NVD API key (free signup) and add NVD_API_KEY=... to ~/.env.local. It bumps the rate limit from 5 to 50 requests per 30 s, so cache-cold runs over many CVEs feel snappier.

6. 🚦 First contact: run pip-audit by hand¶

Before we ask an LLM to do anything, prove the scanner half of the pipeline works:

cd /Developer/edgeAI/edgeLLM/vuln-triage      # inside `sjsujetsontool shell`
python3 -m pip_audit -r sample_project/requirements.txt --format json --no-deps | jq '.dependencies[] | {name, version, n: (.vulns|length)}'

Verified inside the Jetson container while writing this lesson:

{"name": "requests", "version": "2.19.1", "n": 6}
{"name": "jinja2",   "version": "2.10",   "n": 6}
{"name": "pyyaml",   "version": "5.3",    "n": 4}
{"name": "urllib3",  "version": "1.23",   "n": 13}     ← transitive
{"name": "idna",     "version": "2.7",    "n": 3}      ← transitive

33 findings across 4 packages. A human staring at this list would need at least an hour. Our agent — the subject of lesson 12b — does it in about 90 seconds per CVE on an iffy day, and produces structured JSON the rest of your CI can ingest.

7. 🗺️ What's in each of the next three lessons¶

Lesson 12b — Basic tool-calling triage¶

We hand the model four tools as OpenAI-format JSON schemas:

lookup_cve(cve_id) — fetches the official NVD record.
pip_audit_findings(requirements_path) — reuses the scanner.
search_usage(pattern, project_dir) — a tiny grep over the codebase.
read_file(path, project_dir) — slice of source code for the spot the grep found.

…and run one OpenAI tool-calling loop: the model decides which tool to call, we execute it, we feed the result back. Once the model has enough evidence, it emits the JSON verdict. About 130 lines of Python, zero frameworks.

Lesson 12c — Manual ReAct loop¶

Same tools, but instead of relying on the provider's structured tools field, we use the ReAct text protocol — Thought / Action / Observation / … / Final Answer. Why: this works against any chat endpoint (even base models with no tool support), and the model's chain of thought becomes literally visible in the terminal. A 30-line regex parser is the entire "framework."

Lesson 12d — RAG-enhanced triage¶

We add one more tool — similar_cves(query, k) — backed by NVIDIA's nv-embedqa-e5-v5 embedding model and a hand-written CVE notes corpus. The agent uses it as a first step to find prior guidance for this class of vuln, then verifies against the actual source. This is the simplest possible agentic RAG — retrieval is just one more tool in the loop, not a separate pipeline.

8. ✅ Where you are now¶

You should be able to:

Explain the difference between finding a CVE and triaging it.
Sketch our architecture: scanner → agent (LLM + tools) → JSON verdict.
Identify the three distinct shapes in sample_project/ (exploitable, vulnerable-but-not-reachable, dead-weight transitive).
Run python3 -m pip_audit against the sample project and read its JSON output.

Continue to Lesson 12b — Basic tool-calling triage.

Part	What you'll learn	Code
12 (this lesson)	The problem, the sample dataset, the simplified architecture	—
12b	Single-turn OpenAI tool-calling against a coding model	`triage_basic.py`
12c	Manual ReAct loop (no framework)	`triage_react.py`
12d	Same loop + embedding-based retrieval	`triage_rag.py`