Skip to content

πŸ›‘οΈ AI-Powered CVE Triage on the Jetson Orin Nano β€” Part 1: Introduction

Author: Dr. Kaikai Liu, Ph.D. Position: Associate Professor, Computer Engineering Institution: San Jose State University Contact: kaikai.liu@sjsu.edu

Class goal. Build a small AI agent that takes a Python project's requirements.txt, runs a vulnerability scanner against it, and then uses an LLM on NVIDIA Build to decide which findings are actually exploitable in this codebase. The whole thing runs from a single Jetson Orin Nano with one API key β€” no Docker, no microservices, no agent framework.

This is Part 1 of a four-part series:

Part What you'll learn Code
12 (this lesson) The problem, the sample dataset, the simplified architecture β€”
12b Single-turn OpenAI tool-calling against a coding model triage_basic.py
12c Manual ReAct loop (no framework) triage_react.py
12d Same loop + embedding-based retrieval triage_rag.py

Companion code: edgeLLM/vuln-triage/.

🎞️ Overview slides: AI CVE Triage β–Ά β€” the big ideas, the sample puzzle, and the agent loop at a glance.


1. 🎯 What problem are we actually solving?

A modern Python project depends on dozens of third-party packages. A vulnerability scanner (Snyk, Dependabot, GitHub Advanced Security, pip-audit, etc.) cross-references your requirements.txt against the NVD CVE database and produces output like this:

$ pip-audit -r requirements.txt
Found 33 known vulnerabilities in 4 packages.

   requests  2.19.1   CVE-2018-18074   leak Proxy-Authorization on redirect
   requests  2.19.1   CVE-2023-32681   leak Proxy-Authorization on HTTPS proxy
   jinja2    2.10     CVE-2019-10906   str.format_map sandbox escape
   pyyaml    5.3      CVE-2020-1747    yaml.load arbitrary code execution
   urllib3   1.23     CVE-2020-26137   CRLF injection in HTTP method/path
   ...

Every line in that report is technically correct β€” your project does ship a vulnerable version of that package. But for the engineer on the Friday afternoon shift, the real question is:

"Is this CVE actually reachable from our code, or is it a false alarm?"

For a real Jetson application the answer depends on:

  • Whether the vulnerable function is imported at all (pyyaml is famously the most-installed-but-rarely-used Python dep).
  • Whether the vulnerable function is called with attacker-controlled input, or only with internal/fixed data.
  • Whether the project uses a safe wrapper that mitigates the issue (yaml.safe_load instead of yaml.load).

Triage means classifying each finding into one of three buckets:

Bucket Meaning Cost of error
Exploitable here Patch now; you have a real bug. High if missed
Not exploitable Suppress / track for next quarterly upgrade. High if misclassified
Inconclusive Needs a human; the agent did not find enough signal. Bounded

Today this is mostly a human task β€” and humans triage maybe 30 findings an hour. The NVIDIA AI Blueprint for vulnerability analysis shows the production-grade version: Morpheus pipelines, microservices, GPU clusters. Our job for the next three lessons is to compress that idea into ~600 lines of single-file Python on a Jetson.


2. 🧠 Why this is an LLM job (and not a regex)

A scanner already knows the facts β€” version X has CVE Y. What it does not know is the semantic question of whether your code uses yaml.load at all, or only loads YAML from a trusted constant in your own repo.

⚠️ Model availability note (2026-06). Earlier drafts of this lesson defaulted to qwen/qwen3-coder-480b-a35b-instruct, which reached end-of-life on 2026-06-11 and now returns HTTP 410 Gone. The current recommended defaults are minimaxai/minimax-m2.7 or z-ai/glm-5.1. The code snippets below still show the qwen id for historical reasons β€” when you run the scripts, set --model minimaxai/minimax-m2.7 or override TRIAGE_CODER_MODEL in your .env. See Lesson 11b Β§11 for the live per-model status table.

A coding-grade LLM like minimaxai/minimax-m2.7 on NVIDIA Build can:

  1. Read the CVE description and identify the vulnerable pattern (e.g., "calls yaml.load(untrusted_input, Loader=FullLoader)").
  2. Search your codebase for that pattern using a tool we hand it.
  3. Reason about the surrounding context β€” is the input genuinely attacker-controlled?
  4. Emit a verdict + justification in a stable JSON shape we can ingest downstream.

Crucially, the model never sees the whole codebase. We give it a small toolbox and it pulls only the bytes it needs. This keeps the prompt small, the inference cheap, and the agent debuggable.


3. πŸ—οΈ The simplified architecture

                                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                 β”‚  NVIDIA Build (cloud LLM)   β”‚
                                 β”‚  minimaxai/minimax-m2.7     β”‚
                                 β”‚  + nv-embedqa-e5-v5         β”‚
                                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                β”‚  OpenAI-compatible
                                                β”‚  /chat/completions
                                                β”‚  /embeddings
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚  Jetson Orin Nano                                β”‚
                       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                       β”‚  β”‚  triage_basic.py    (lesson 12b)           β”‚  β”‚
                       β”‚  β”‚  triage_react.py    (lesson 12c)           β”‚  β”‚
                       β”‚  β”‚  triage_rag.py      (lesson 12d)           β”‚  β”‚
                       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                       β”‚      β–²          β–²          β–²          β–²          β”‚
                       β”‚      β”‚          β”‚          β”‚          β”‚          β”‚
                       β”‚  lookup_cve search_usage read_file similar_cves  β”‚
                       β”‚  (NVD JSON)  (grep)    (file slice) (embeddings) β”‚
                       β”‚      β”‚                                           β”‚
                       β”‚  pip_audit_findings  ← shells out to pip-audit   β”‚
                       β”‚  (./sample_project/requirements.txt)             β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Three Python entrypoints, four tools, one sample project. Compared to the upstream blueprint we cut:

Upstream blueprint Our version
Morpheus streaming pipeline One for loop
LangChain + LangGraph agent OpenAI tool-calling, then a manual ReAct loop
Triton-served LLM NVIDIA Build hosted endpoints
Milvus vector DB An in-memory cosine loop over a ~12-row JSONL
Docker / Helm python triage_basic.py
Hours of setup One pip install

What we keep is the core idea: an LLM with tools, asked to classify a scanner finding into actionable buckets.


4. πŸ§ͺ The running example: sample_project/

We ship a tiny Python application that is intentionally a triage puzzle. Look at edgeLLM/vuln-triage/sample_project/:

sample_project/
β”œβ”€β”€ requirements.txt        # 3 packages, all pinned to vulnerable versions
β”œβ”€β”€ app.py                  # uses 2 of them, in different ways
└── README.md

The requirements.txt:

requests==2.19.1     # CVE-2018-18074 + others
jinja2==2.10         # CVE-2019-10906 + others
pyyaml==5.3          # CVE-2020-1747  + others

…and app.py deliberately exercises three distinct triage shapes:

Package Used? How Expected verdict
requests βœ… requests.get(url) with a caller-supplied url (no validation) Exploitable
jinja2 βœ… Renders a hard-coded _STATUS_TEMPLATE constant; no user template Not exploitable
pyyaml ❌ Listed in requirements.txt, never imported Not exploitable (dead weight)

urllib3 also shows up β€” but only as a transitive dependency of requests. A good agent will say "exposure depends on the requests usage above, not direct code in app.py."

These are the three patterns every real triage workflow has to handle. By the end of lesson 12d you will have a script that classifies all three correctly, on a Jetson, in a couple of minutes per finding.


5. πŸ“¦ Prerequisites

We run everything inside the Jetson AI container, where Python + pip are ready and your ~/.env.local keys are injected for you:

sjsujetsontool shell                        # enter the container (brings in NVIDIA_API_KEY)
cd /Developer/edgeAI/edgeLLM/vuln-triage
pip install -r requirements.txt             # installs into the container

That installs three packages:

  • openai>=1.40 β€” the OpenAI-compatible client that talks to NVIDIA Build.
  • httpx>=0.27 β€” drops Python's default proxy handling for clean direct calls.
  • pip-audit>=2.7 β€” the actual vulnerability scanner; reads NVD + GHSA.

Your NVIDIA Build key comes from ~/.env.local (saved earlier via sjsujetsontool chat / setup-nvapi); sjsujetsontool shell passes it into the container, so NVIDIA_API_KEY is already set β€” check with echo $NVIDIA_API_KEY. If you don't have one yet, get a free key at https://build.nvidia.com and add it: echo "NVIDIA_API_KEY=nvapi-..." >> ~/.env.local (then re-enter the shell).

Optional but recommended: get an NVD API key (free signup) and add NVD_API_KEY=... to ~/.env.local. It bumps the rate limit from 5 to 50 requests per 30 s, so cache-cold runs over many CVEs feel snappier.


6. 🚦 First contact: run pip-audit by hand

Before we ask an LLM to do anything, prove the scanner half of the pipeline works:

cd /Developer/edgeAI/edgeLLM/vuln-triage      # inside `sjsujetsontool shell`
python3 -m pip_audit -r sample_project/requirements.txt --format json --no-deps | jq '.dependencies[] | {name, version, n: (.vulns|length)}'

Verified inside the Jetson container while writing this lesson:

{"name": "requests", "version": "2.19.1", "n": 6}
{"name": "jinja2",   "version": "2.10",   "n": 6}
{"name": "pyyaml",   "version": "5.3",    "n": 4}
{"name": "urllib3",  "version": "1.23",   "n": 13}     ← transitive
{"name": "idna",     "version": "2.7",    "n": 3}      ← transitive

33 findings across 4 packages. A human staring at this list would need at least an hour. Our agent β€” the subject of lesson 12b β€” does it in about 90 seconds per CVE on an iffy day, and produces structured JSON the rest of your CI can ingest.


7. πŸ—ΊοΈ What's in each of the next three lessons

Lesson 12b β€” Basic tool-calling triage

We hand the model four tools as OpenAI-format JSON schemas:

  • lookup_cve(cve_id) β€” fetches the official NVD record.
  • pip_audit_findings(requirements_path) β€” reuses the scanner.
  • search_usage(pattern, project_dir) β€” a tiny grep over the codebase.
  • read_file(path, project_dir) β€” slice of source code for the spot the grep found.

…and run one OpenAI tool-calling loop: the model decides which tool to call, we execute it, we feed the result back. Once the model has enough evidence, it emits the JSON verdict. About 130 lines of Python, zero frameworks.

Lesson 12c β€” Manual ReAct loop

Same tools, but instead of relying on the provider's structured tools field, we use the ReAct text protocol β€” Thought / Action / Observation / … / Final Answer. Why: this works against any chat endpoint (even base models with no tool support), and the model's chain of thought becomes literally visible in the terminal. A 30-line regex parser is the entire "framework."

Lesson 12d β€” RAG-enhanced triage

We add one more tool β€” similar_cves(query, k) β€” backed by NVIDIA's nv-embedqa-e5-v5 embedding model and a hand-written CVE notes corpus. The agent uses it as a first step to find prior guidance for this class of vuln, then verifies against the actual source. This is the simplest possible agentic RAG β€” retrieval is just one more tool in the loop, not a separate pipeline.


8. βœ… Where you are now

You should be able to:

  • Explain the difference between finding a CVE and triaging it.
  • Sketch our architecture: scanner β†’ agent (LLM + tools) β†’ JSON verdict.
  • Identify the three distinct shapes in sample_project/ (exploitable, vulnerable-but-not-reachable, dead-weight transitive).
  • Run python3 -m pip_audit against the sample project and read its JSON output.

Continue to Lesson 12b β€” Basic tool-calling triage.