๐ ๏ธ AI-Powered CVE Triage on Jetson โ Part 2: Basic Tool-Calling¶
Author: Dr. Kaikai Liu, Ph.D. Institution: San Jose State University
Prerequisite: Lesson 12 โ Intro and a working
python -m pip_auditagainstsample_project/.Companion code:
edgeLLM/vuln-triage/triage_basic.pyโ every snippet below is an excerpt from that file.
1. ๐ฏ What you'll build¶
โ ๏ธ 2026-06 update. This lesson originally targeted
qwen/qwen3-coder-480b-a35b-instruct, which reached EOL on 2026-06-11. Run the script with--model minimaxai/minimax-m2.7(orz-ai/glm-5.1) instead โ both are free-tier-available and OpenAI-tools-compatible. The verified output block in ยง7 is from the original qwen run and is kept for the reasoning-pattern walkthrough.
A single Python script that:
- Runs
pip-auditagainstsample_project/requirements.txt. - For each CVE finding, opens an OpenAI tool-calling loop against
minimaxai/minimax-m2.7on NVIDIA Build (or any other model listed in Lesson 11b ยง11.1). - Lets the model call four read-only tools to gather evidence:
lookup_cve(cve_id)โ official NVD record.pip_audit_findings(requirements_path)โ re-run the scanner.search_usage(pattern, project_dir)โ grep the source tree.read_file(path, project_dir)โ slice a file.- Stops the loop when the model emits a final JSON verdict, and pretty-prints it.
No agent framework. ~130 lines of Python plus four ~50-line tool files.
2. ๐ง Setup¶
ssh jetsonorin
source ~/.venv/bin/activate
/usr/bin/python3 -m pip install --target ~/.venv/lib/python3.10/site-packages \
-r ~/vuln-triage/requirements.txt
export NVIDIA_API_KEY=nvapi-...
cd ~/vuln-triage
The toolbox lives in tools/:
tools/
โโโ __init__.py
โโโ cve_lookup.py # lookup_cve(cve_id)
โโโ code_search.py # search_usage + read_file
โโโ pip_audit_runner.py # pip_audit_findings
Each tool is plain Python with a JSON-serialisable return. None of them know about LLMs.
3. ๐งฉ Step 1 โ Define tools as OpenAI JSON schemas¶
NVIDIA Build accepts the same tools=[...] schema OpenAI's API expects.
Each tool is a JSON description of the function the model is allowed to
call. The name here must match the key in our TOOL_IMPL dict โ
that is how the agent loop dispatches.
TOOL_SCHEMAS = [
{
"type": "function",
"function": {
"name": "lookup_cve",
"description": "Fetch the official NVD record for a CVE id and "
"return its description, CVSS score, CWE ids, and "
"affected packages.",
"parameters": {
"type": "object",
"properties": {
"cve_id": {
"type": "string",
"description": "e.g. CVE-2018-18074",
}
},
"required": ["cve_id"],
},
},
},
# ... pip_audit_findings, search_usage, read_file ...
]
TOOL_IMPL = {
"lookup_cve": lookup_cve,
"pip_audit_findings": pip_audit_findings,
"search_usage": search_usage,
"read_file": read_file,
}
Key design choices:
descriptionis the only documentation the model has. Make it explicit about when to call this tool ("Use this after search_usage finds an interesting line"). The model uses these descriptions as a decision criterion.parameters.requiredis small. We let the model omit optional args likeis_regexandstart.- No write tools. Everything is read-only by construction โ important for the safety properties of any agent that runs against source code.
4. ๐งฉ Step 2 โ Dispatch a tool call safely¶
The agent loop will eventually receive a tool_calls array from the
model. Each entry has a function.name and a function.arguments
string (JSON). Our job is to:
def _dispatch_tool(name: str, arguments_json: str) -> str:
try:
args = json.loads(arguments_json or "{}")
except json.JSONDecodeError:
return json.dumps({"error": f"bad JSON arguments: {arguments_json!r}"})
fn = TOOL_IMPL.get(name)
if fn is None:
return json.dumps({"error": f"unknown tool: {name}"})
try:
result = fn(**args)
except TypeError as exc:
return json.dumps({"error": f"bad args for {name}: {exc}"})
except Exception as exc:
return json.dumps({"error": f"{type(exc).__name__}: {exc}"})
return json.dumps(result, default=str)[:6000]
Three robustness rules that bit us during development:
- Never trust the JSON. Models sometimes emit trailing commas or
unquoted keys. Catch
json.JSONDecodeError, return it as a tool observation โ the model will retry. - Catch
TypeErrorseparately. It is the signature mismatch case ("got an unexpected keywordpat"). Reporting the actual TypeError to the model lets it self-correct on the next turn. - Truncate. Long observations balloon the prompt for the next round. 6 KB is plenty for our triage tools; bigger projects might summarise instead.
5. ๐งฉ Step 3 โ The actual agent loop¶
This is the heart of the whole lesson. ~50 lines of Python; no agent classes, no decorators, no callbacks.
def triage_one(client, *, finding, project_dir, requirements_path,
model, verbose=True) -> Verdict:
cve_id = finding["primary_cve"]
package, version = finding["package"], finding["version"]
messages = [
{"role": "system", "content": SYSTEM_PROMPT_BASIC},
{"role": "user", "content": f"""
Triage this finding:
CVE : {cve_id}
Package : {package}
Version : {version}
Requirements: {requirements_path}
Project dir : {project_dir}
Decide whether this codebase actually exposes the vulnerable
code path. Call tools as needed; reply with the JSON verdict
when done.
{VERDICT_SCHEMA_HINT}
""".strip()},
]
for round_idx in range(MAX_TOOL_ROUNDS):
resp = client.chat.completions.create(
model=model, messages=messages,
tools=TOOL_SCHEMAS, tool_choice="auto",
temperature=0.1, max_tokens=4096,
)
msg = resp.choices[0].message
messages.append({ # โ keep the
"role": "assistant", "content": msg.content, # assistant
"tool_calls": [ # turn in
{"id": tc.id, "type": "function", # history
"function": {"name": tc.function.name,
"arguments": tc.function.arguments}}
for tc in (msg.tool_calls or [])
] or None,
})
if not msg.tool_calls: # โ model is
return parse_verdict(msg.content, # done; we
cve_id=cve_id, # parse its
package=package, # final JSON
version=version)
for tc in msg.tool_calls: # โ execute
result_str = _dispatch_tool(tc.function.name, # every tool
tc.function.arguments) # it asked
messages.append({ # for and
"role": "tool", # feed each
"tool_call_id": tc.id, # result
"name": tc.function.name, # back
"content": result_str,
})
# ... fall through to a forced no-tool final round on budget exhaustion ...
Five things to internalize from those 50 lines:
| Concept | Where |
|---|---|
tool_choice="auto" lets the model pick whether to call a tool, or just return text. |
Line with tool_choice |
The whole conversation is one growing messages list. The model never has memory across .create() calls โ we provide it. |
Appends |
Tool replies use the special role "tool" and must carry the matching tool_call_id. |
Inner loop |
The exit condition is not msg.tool_calls โ the model stops calling tools when it has its answer. |
if not msg.tool_calls |
MAX_TOOL_ROUNDS is a hard ceiling so a confused model can't burn quota forever. |
for round_idx in range(...) |
That last point is non-negotiable. A two-line guard prevents a runaway loop from costing you $50 in NVIDIA Build credits.
6. ๐งฉ Step 4 โ Parse the verdict¶
The system prompt instructs the model to return a JSON object as its final assistant message. Real chat models sometimes wrap it in triple-back-tick fences anyway. We extract defensively:
def extract_json_block(text: str) -> dict:
if "```" in text:
for chunk in text.split("```"):
chunk = chunk.strip()
if chunk.startswith("json"):
chunk = chunk[4:].strip()
if chunk.startswith("{"):
try:
return json.loads(chunk)
except json.JSONDecodeError:
continue
start, end = text.find("{"), text.rfind("}")
if start != -1 and end > start:
return json.loads(text[start : end + 1])
raise ValueError(f"no JSON found in:\n{text[:400]}")
This is forgiving on purpose โ the alternative is to error out when the model adds one stray line of commentary, which would be a frustrating classroom experience.
7. โถ๏ธ Run it¶
cd ~/vuln-triage
python3 triage_basic.py --project sample_project --limit 2
Verified output on Jetson Orin Nano while writing this lesson
(qwen/qwen3-coder-480b-a35b-instruct, two CVEs against requests):
โ project : /home/cmpe/vuln-triage/sample_project
โ requirements : /home/cmpe/vuln-triage/sample_project/requirements.txt
โ model : qwen/qwen3-coder-480b-a35b-instruct
โ running pip-audit โฆ
pip-audit reported 33 finding(s).
[1/2] requests 2.19.1 โ CVE-2018-18074
ยท round 1: calling model โฆ 2.8s (tokens: prompt=1021, completion=36)
โ lookup_cve({"cve_id":"CVE-2018-18074"})
ยท round 2: calling model โฆ 2.5s (tokens: prompt=1417, completion=38)
โ pip_audit_findings({"requirements_path":".../sample_project/requirements.txt"})
ยท round 3: calling model โฆ 20.3s (tokens: prompt=3012, completion=42)
โ search_usage({"project_dir":".../sample_project","pattern":"requests"})
ยท round 4: calling model โฆ 50.6s (tokens: prompt=3471, completion=62)
โ read_file({"end":38,"project_dir":".../sample_project","start":23,"path":"app.py"})
ยท round 5: calling model โฆ 14.1s (tokens: prompt=3824, completion=114)
โโ CVE-2018-18074 [requests 2.19.1]
โ verdict : EXPLOITABLE HERE (confidence: high)
โ reason : The codebase directly uses the `requests` library in `app.py` (line
โ 37) to make HTTP requests with caller-supplied URLs,
โ which matches the vulnerable code path described in
โ CVE-2018-18074 โฆ
โ action : Upgrade the `requests` package to version 2.20.0 or later
โโโ
Five rounds, 11โ13 k tokens total, ~90 seconds wall time. The agent took exactly the path a human analyst would: read the CVE โ look at what other findings live in this project โ grep for usage โ read the exact lines โ conclude.
8. ๐งช Try in class¶
- Catch the agent being lazy. Re-run with
--cve CVE-2019-10906(the jinja2 sandbox issue). Does the model bother to read the actual template definition inapp.py, or does it just trust the package name? The 12c lesson will show the same case under a more visible loop. - Swap the coding model. Try
--model "z-ai/glm-5.1",--model "deepseek-ai/deepseek-v4-pro", or โ if you have the keys โ--model "claude-sonnet-4-6"/--model "gpt-4o-mini". Compare token usage, latency, and the quality of the justification text. The Lesson 11b ยง11 status table notes which are currently slow or EOL'd. - Inject a wrong package. Edit
app.pyto addimport yaml; yaml.load(open("x.yaml")). Re-run with--cve CVE-2020-1747. The agent should flip to Exploitable here. - Read the messages array. Add
print(json.dumps(messages, indent=2))just before each.create()call. You'll see exactly how the conversation grows turn by turn โ this is the mental model you need for agent debugging.
9. โ๏ธ Limitations of this design¶
This 130-line agent is great, but it leaves three sharp edges:
| Limitation | What it means | Fixed in |
|---|---|---|
Only works on providers that support OpenAI tools schema. |
A self-hosted vLLM serving a base model won't accept it. | 12c uses the text-protocol ReAct alternative. |
Hides the model's reasoning behind opaque tool_calls. |
When the agent picks a weird tool, you can only see what it called, not why. | 12c shows the chain of thought line by line. |
| No mechanism for prior knowledge. | The model re-discovers "yaml.load is the danger pattern" every single run. | 12d adds an embedding-search tool over a small corpus of triage notes. |
Continue to Lesson 12c โ ReAct without a framework.