🤖 The ReAct Loop

The core of every AI agent

SJSU · Edge AI

Reason + Act: an LLM in a loop with tools — run it as sjsujetsontool chat --agent.

1 Chatbot vs. agent

Chatbot — one prompt → one answer.
Knows only its weights + your prompt.

Agent — a chatbot put in a loop with tools.
Given a goal, it repeatedly decides what to do next, acts, observes, and continues.

Two ingredients turn a model into an agent:

  1. Tools — functions it may call (read / grep / search / write / edit).
  2. A control loop — code that runs the tool and feeds the result back.

That loop is the whole idea. RAG, coding assistants, multi-agent systems — all variations on it.

2 ReAct = Reason + Act

The model interleaves reasoning and actions in a strict text protocol — one step per turn:

Thought: I should look at app.py first
Action: read_file
Action Input: {"path": "app.py", "end": 40}

Our code runs the tool and appends the result; the model reads it and continues:

Observation: 1  import requests  ...
Thought: I now know the answer
Final Answer: app.py calls requests.get(url) with a caller-supplied URL.

Plain text → works on any chat model (even local base models) and the reasoning is visible to debug.

3 How does an LLM "call" a tool? (spoiler: it writes text)

A common misconception: "the model has an Execute button."
Reality: an LLM is a next-token predictor. It can only write text. Period.

When you see this in a transcript, the model is literally typing those characters:

Action: read_file
Action Input: {"path": "calculator.py", "end": 40}

It does not open any file. Our runtime reads that text and dispatches the call:

m    = _ACTION.search(text)                          # find "Action: read_file"
args = json.loads(_INPUT_JSON.search(text).group(1)) # find {"path": …}
obs  = tools.dispatch(m.group(1), args)              # actually run it in Python

The string the tool returns is glued back into the conversation as the next turn:

messages.append({"role": "user", "content": "Observation: " + obs})

Why does this even work? The model was trained on millions of similar transcripts —
code, ChatGPT logs, agent demos. It learned that emitting these characters reliably produces
useful follow-ups. ReAct therefore works on any chat model — even a 0.8 B local base model.

4 The ReAct loop, visualized

🧑 USER GOAL  "Fix the typo in calculator.py"
ReAct LOOP  (repeat up to max_steps times)
🧠 Brain — LLM API
reads messages[]
writes Thought: + Action:
🛠️ Parser + Tool
regex finds Action:
tools.dispatch() runs in Python
↑ Observation  (tool result appended as a "user" message for the next turn)
↓  on Final Answer:
🤖 FINAL ANSWER back to the user

Five movable parts. Edit any one to change the agent: the brain
(any complete(messages)→str), the parser (regex), the
tools, the policy (REACT_SYSTEM prompt), and the
stop condition (max_steps / Final Answer).

5 You're standing on giants — the agent landscape

You've probably seen or used some of these. All implement the same fundamental pattern.

Closed / commercial:

  • 🟠 Claude Code (Anthropic) — terminal coding agent
  • 🟢 OpenAI Codex CLI — terminal coding agent
  • 🐙 GitHub Copilot — Agent mode — VSCode
  • ⚡ Cursor — IDE with agent

Open-source:

  • 🤖 OpenHands (was OpenDevin) — autonomous agent
  • ✂️ Aider — terminal coding agent
  • 🔁 Continue.dev — VSCode + JetBrains
  • 🛡️ OpenCodex — community Codex clone

All variations on LLM + tools + loop (sometimes + RAG / multi-agent / browser / sandbox).
The differences are the tool kit (browser? terminal exec? Git? screenshots?) and the
policy (prompt + guardrails) — not the core algorithm.

Our edge_agent is ~120 lines that distills the essence. Read it in one sitting,
hack it on day 1. The same loop powers sjsujetsontool chat --agent, the Next.js Agent Lab,
and the Lesson 12 vuln-triage scripts.

6 Architecture — one core, multiple interfaces

edgeLLM/edge_agent/  — pure-stdlib Python (zero runtime deps)
tools.py
read · grep · search · write · edit · web_search
react_loop.py
ReActAgent + tolerant parser
tool_calling.py
native OpenAI tools=[…] loop
LLM API seam: complete(messages) → str  any chat backend plugs here — local llama.cpp · NVIDIA · OpenAI · Anthropic · …
🐚 CLI
sjsujetsontool chat --agent
terminal, in-chat /agent on
🌐 FastAPI
agent_sidecar.py :8002
POST /run → SSE → browser
🛡️ Vuln triage
triage_react.py
Lesson 12c / 12d scripts

Three consumers, same import: from edge_agent import Tools, ReActAgent.
The only seam between the agent and "intelligence" is the highlighted complete()
callable — see Step 8 for what that signature buys us.

7 The tools — edge_agent/tools.py

Five built-in verbs (plus one optional online tool) — every path confined to a project root:

Tool Purpose
read_file(path, start, end) read a slice, with line numbers
grep(pattern, path, is_regex) search contents → file:line: text
search_files(glob, dir) find files by name
write_file(path, content) create / overwrite
edit_file(path, old, new) replace one exact, unique snippet
web_search(query, num=5) (opt-in) Google via SerpAPI, auto-on when SERPAPI_API_KEY is set

edit_file refuses unless old matches exactly once → forces a read_file first.
dispatch() always returns a string, so a bad call becomes an Observation, not a crash.
web_search reads its key from ~/.env.local — see Step 9 for the one-line setup.

8 The loop — the seam: complete(messages) → str

The loop is decoupled from HTTP. It doesn't know whether the model lives on a cloud
endpoint, in-process llama.cpp, or a unit-test stub. It just gets a callable:

complete(messages: list[dict]) -> str   # ← the ONLY seam between agent and "intelligence"

Three real implementations — all drop-in interchangeable behind the same ReActAgent:

# (a) NVIDIA Build via the OpenAI client (also OpenAI / Anthropic / any compat URL)
def complete(msgs):
    return client.chat.completions.create(model="…", messages=msgs).choices[0].message.content

# (b) local llama-cpp-python in-process — no network at all
def complete(msgs):
    return llm.create_chat_completion(messages=msgs)["choices"][0]["message"]["content"]

# (c) mocked for unit tests — deterministic, zero quota burned
def complete(msgs):
    return 'Thought: stub\nAction: read_file\nAction Input: {"path":"x.py"}'

Same agent code runs against a 0.5 B local model OR GPT-5 — no provider lock-in.

8 The loop — the 5-step run() function

The loop itself stays tiny — 5 things, one for-loop, ~12 lines:

def run(self, task):
    messages = [{"role": "system", "content": REACT_SYSTEM},
                {"role": "user",   "content": task}]
    for step in range(self.max_steps):
        text = self.complete(messages)                                # 1) reason
        messages.append({"role": "assistant", "content": text})
        if (m := _FINAL.search(text)): return m.group(1).strip()      # 2) done?
        act = _ACTION.search(text)                                    # 3) parse Action
        obs = self.tools.dispatch(act.group(1).strip(), json.loads(act.group(2)))  # 4) act
        messages.append({"role": "user", "content": "Observation: " + obs})        # 5) observe

reason → done? → parse → act → observe, with a
max_steps cap so it can't loop forever. Swap providers by passing a different
complete — zero loop changes.

9 Interface #1 — CLI: sjsujetsontool chat --agent

The terminal client imports edge_agent directly and gives the loop a complete() that
points at whatever backend is currently selected (local llama.cpp · NVIDIA Build · OpenAI · Anthropic).

sjsujetsontool chat --agent --agent-dir ./sample_project
# …or inside a normal chat session:
/agent on
/agent dir ./sample_project
What does app.py do, and is the requests CVE reachable?

Watch the trace stream by:

[step 1] Thought: read the app   Action: read_file  {"path":"app.py","end":40}
   Observation: 1  import requests ...
[step 2] Thought: confirm URL    Action: grep        {"pattern":"requests.get"}
   Observation: app.py:37: response = requests.get(url, timeout=5)
🤖 requests.get(url) takes a caller-supplied URL → the HTTP CVE path is reachable.

🎥 Video Demo: Enabling Agent Mode in Chat

Observe how the ReAct agent loops through thoughts, actions (calling tools), and observations to solve tasks.

Enabling agent mode using /agent on and pointing to a workspace directory

10 Interface #2 — FastAPI agent backend

Same edge_agent core — wrapped in a ~250-line FastAPI service so a browser can see it
(agent_sidecar/agent_sidecar.py).
One streaming endpoint, SSE per step:

import edge_agent
from fastapi.responses import StreamingResponse

@app.post("/run")
async def run(request):
    body  = await request.json()
    tools = edge_agent.Tools(root=body["root"])
    msgs  = [{"role":"system","content":edge_agent.REACT_SYSTEM.format(...)},
             {"role":"user",  "content":body["task"]}]
    def stream():
        yield _sse({"type":"start","tools":edge_agent.tool_names()})
        for step in range(body["max_steps"]):
            reply = openai_call(msgs)                       # any OpenAI-compatible URL
            parsed = edge_agent.react_loop.parse_step(reply)
            yield _sse({"type":"step","action":parsed[1],"input":parsed[2]})
            obs = tools.dispatch(parsed[1], parsed[2])      # ← SAME tools.py as the CLI
            yield _sse({"type":"observation","text":obs})
    return StreamingResponse(stream(), media_type="text/event-stream")

Start it: sjsujetsontool agent bg → browser sees every Thought / Action / Observation card live.

10 FastAPI agent backend — HTTP surface, modular & scalable

Small HTTP surface, on purpose:

Method Path Purpose
GET /health Liveness + lists tools + workspace root
POST /run Body JSON → SSE stream of {start, step, observation, nudge, final, error}
GET /docs Auto Swagger UI — try /run from the browser, no curl needed
GET /openapi.json Machine-readable schema (drives typed JS clients)

Why modular?

  • Each endpoint is independent — add /sessions/{id}/resume without touching /run.
  • Swap the agent algorithm (LangChain, AutoGen) → only /run changes; UI stays the same.
  • Browser, Postman, curl, Python — anyone with HTTP can talk to it.

Why scalable?

  • Stateless by default → uvicorn --workers N for N× concurrency.
  • SSE doesn't hold sessions → each /run finishes and lets go.
  • Async-friendly — FastAPI threadpools blocking work (edge_agent, Riva) automatically.
  • Decoupled from UI: same backend serves the Lab and a CLI client.

11 Context window — the hidden ceiling

Every model has a max context. Your whole conversation must fit inside:

system_prompt + task + (Thought + Action + Observation) × N steps
Backend Window
Local Qwen 3.5 (Q4) on Jetson 4 – 8 K
node05 Qwen 3.5-9B 8 – 32 K
NVIDIA Nemotron 49 B / 70 B 128 K
Claude Sonnet 4.6 200 K

Built-in defenses:

  • dispatch() truncates every tool result to 6 000 chars
  • max_steps = 8 by default → naturally bounded

Mitigations when you need more headroom:

  • ✂️ read_file(path, start=10, end=40)never whole files
  • 🔎 grep first, then a narrow read_file range around the hit
  • 📉 Lower max_steps if your task is small
  • 📚 Pick a bigger window (Nemotron 128 K ≫ local 8 K)
  • 🪄 Have the agent summarize old observations for long runs

Symptom of exhaustion: the agent forgets the original task, hallucinates files, or re-runs
the same Action twice. Fix is almost always smaller observations, not a smarter prompt.

12 Two ways to call tools — same outcome, different fence

Under the hood both paths are just the model emitting tokens. The difference is who parses
those tokens
— you (regex) or the provider (built-in JSON unpacker).

ReAct text loop Native tool-calling
File react_loop.py tool_calling.py
What the model emits free text following the protocol tokens forming a JSON-schema'd object
Who parses you — regex in react_loop.py the provider — built into its API
Works on any chat model ✅ even base / local ❌ needs a tool-fine-tuned model
Reasoning visible ✅ Thought: in plain text partial — text + opaque tool_calls
Used by chat --agent, lesson 12c, Agent Lab lesson 12b, OpenAI / Anthropic SDKs

Same tools.py powers both paths — only the transport differs. This is the connective
tissue that ties every lab in the curriculum together.

13 Enabling web_search — one line in ~/.env.local

web_search is the opt-in 6th tool — pure urllib (no requests dep) calling Google via
SerpAPI (free 100 searches/month).

# 1) Get a free key:  https://serpapi.com   → copy the API key from the dashboard
# 2) Stash it in the same file every lab reads:
echo "SERPAPI_API_KEY=…" >> ~/.env.local && chmod 600 ~/.env.local

That's it. The agent now automatically has a 6th tool:

  • sjsujetsontool chat --agent — re-launches the CLI; tool_names() returns 6 entries.
  • sjsujetsontool agent bg — restarts the FastAPI backend, which re-reads the env; the Agent Lab UI's web_search disabled banner disappears on next refresh.

Graceful absence — without a key, calling web_search returns
"ERROR: web_search is disabled (no SERPAPI_API_KEY in env)." The model sees that as an
Observation and falls back to file tools, never crashes.

14 Extend it — add your own tool

Add a run_python tool in three places, all in tools.py:

class Tools:
    # 1) the method itself (confined to root by self._resolve)
    def run_python(self, code):
        import subprocess, sys
        out = subprocess.run([sys.executable, "-c", code],
                             capture_output=True, text=True, timeout=15, cwd=self.root)
        return (out.stdout + out.stderr)[:6000] or "(no output)"

# 2) advertise it to the ReAct prompt
TOOL_NAMES = ["read_file","grep","search_files","write_file","edit_file","run_python"]

# 3) (optional) advertise it to native tool-calling too
OPENAI_SCHEMAS.append({"type":"function","function":{
    "name":"run_python",
    "description":"Run a short Python snippet inside the workspace and return stdout+stderr.",
    "parameters":{"type":"object","properties":{"code":{"type":"string"}},"required":["code"]}}})

That's it. Both the CLI agent and the FastAPI Agent Lab pick it up automatically — no other
file touched. The agent's tool kit just grew by one.

14 Extend it — revise the prompt (the other half of an agent)

Three knobs for the system prompt, biggest to smallest:

Knob File What it controls
REACT_SYSTEM edge_agent/react_loop.py The ReAct rules + examples shown to every agent run
Per-backend tweaks agent_sidecar.py event_stream() Append extras only for a specific backend
User task Agent Lab UI input What this run is asked to do

Three popular REACT_SYSTEM tweaks you can ship in five minutes:

  • 🛡️ Read-only auditor — drop write_file / edit_file from TOOL_NAMES and add
    "You must NOT modify files." to REACT_SYSTEM.
  • 🗺️ Plan-first"Output a numbered plan in your first Thought; emit Actions only after."
  • 📋 Tone control"Final Answer must be a markdown bullet list, ≤ 5 bullets."

The system prompt is half of what the model effectively is at runtime. Every refinement
propagates instantly to all three interfaces (CLI · FastAPI · CVE labs) — they import the same
REACT_SYSTEM constant.

15 Bring your own project — sjsujetsontool is path-agnostic

Cloned your own Next.js / Vite / agent repo into /Developer/edgeAI/edgeLLM/my-app? Or anywhere
else under /Developer? Same two commands, three ways to point them at it:

(1) Pass the path as an arg

sjsujetsontool node  bg /Developer/my-app
sjsujetsontool agent bg /Developer/my-app/agent_sidecar

(2) Export an env var (per-shell or in ~/.bashrc)

export SJSUJETSONTOOL_NODE_DIR=/Developer/my-app
export SJSUJETSONTOOL_AGENT_DIR=/Developer/my-app/agent_sidecar
export SJSUJETSONTOOL_EDGE_AGENT_DIR=/Developer/my-edge-agent  # custom Python core
# now `sjsujetsontool node` / `agent` defaults follow YOUR project

(3) Change it inside the Agent Lab UI (no restart)

  • Workspace input — sets the agent's project root for read_file/grep/write_file/edit_file
  • Backend dropdown — switch live: NVIDIA · Local llama.cpp · node05 · OpenAI · Anthropic
  • Custom option — paste any OpenAI-compatible URL + optional key:
    • your own vLLM, Ollama, Together.ai, a corporate gateway, …
    • base_url + api_key fields appear, get forwarded to the sidecar as-is.

Path must live under /Developer/ (that's the dir the container mounts 1:1
from the host). The framework doesn't matter — node runs whatever package.json says (dev /
start), agent runs whatever agent_sidecar.py exposes.

Full lesson → lkk688.github.io/edgeAI/curriculum/13_react_agent