Inside Claude Code: How an AI Agent Thinks in a Loop
The recursive query loop at the heart of Claude Code — how it manages multi-turn reasoning, tool execution, and streaming responses. Part 2 of 10.
How an AI Agent Thinks in a Loop
Think about a chef preparing a complex dish. They don’t just throw everything in a pot and walk away. They cook a little, taste, adjust the seasoning, cook some more, taste again. Each cycle refines the result. The dish emerges from iteration, not from a single act.
Claude Code’s thinking works the same way. When you type “fix this bug,” it doesn’t fire a single API call and return a response. It reads a file, runs a test, edits the code, runs the test again, checks the output, maybe edits once more. Each step requires calling the AI model, executing tools, feeding results back, and deciding what to do next.
This is the query engine — the brain of every AI agent. And understanding how it works is the key to understanding why AI coding assistants behave the way they do. I found the implementation fascinating — and also illuminating about the trade-offs inherent in building agentic systems.
The Problem: Multi-Turn Reasoning Is Hard
A simple chatbot takes a message and returns a response. One shot. Done. But an AI agent needs to act in the world. Consider what happens when you ask Claude Code to “find and fix the failing test in src/utils”:
- The agent calls the API with your message
- The model decides to read the test file — it emits a
tool_useblock - The agent executes that tool, captures the result
- It sends the result back to the model
- The model reads the file, decides to run the test suite — another
tool_use - The agent runs the test, captures the failure output
- Back to the model: now it understands the bug, decides to edit a source file
- The agent performs the edit
- The model wants to verify — runs the test again
- Tests pass. The model emits a final text response
That is five round trips to the API in a single user interaction. Each one requires streaming, tool detection, permission checks, error handling, context management, and state tracking. How do you manage this loop without it becoming an unmaintainable mess?
The answer in Claude Code is queryLoop — a while(true) generator function that orchestrates every turn of reasoning. Let’s trace exactly how it works.
How Claude Code Solves It
The High-Level Flow
At the highest level, the query engine is a cycle. A user message enters, the model streams a response, tools are detected and executed, and results feed back into the next iteration until the model produces a final text response with no tool calls.
High-Level Query Flow
The simplicity is deceptive. Inside that while(true), there are seven distinct reasons the loop might continue back to the top — and each one represents a different recovery or continuation strategy.
The Seven Continue Sites
What’s interesting here is that the queryLoop function in query.ts (line 241) maintains a State object that is reassigned at each continue site. The transition field records why the loop continued, making implicit states inspectable after the fact. Here are the seven continue sites, mapped as a state diagram:
Seven Continue Sites — State Diagram
Each of these states is encoded as a string in the transition.reason field: 'collapse_drain_retry', 'reactive_compact_retry', 'max_output_tokens_escalate', 'max_output_tokens_recovery', 'model_fallback', 'stop_hook_blocking', 'token_budget_continuation', and 'next_turn'. But they are not declared as an enum or a union type. They exist only as string literals scattered across a very large orchestration file (src/query.ts, ~1.7K lines). More on that later.
A Single Turn with Tool Use
Let’s zoom into a single iteration — what happens between one API call and the next when the model decides to use a tool.
Single Turn with Tool Use
The key insight: each API call is a full streaming request. The model sees the entire conversation history (or the compacted version of it), including all previous tool results. Context management — autocompact, microcompact, snip, context collapse — runs at the top of each iteration, before the API call, to keep the conversation within token limits.
Streaming vs Orchestrated Tool Execution
Claude Code has two tool execution strategies, selectable via a feature gate. They represent fundamentally different trade-offs, and I found it instructive to compare them:
Streaming vs Orchestrated Execution
toolOrchestration.ts (runTools): Waits until the API response is fully streamed, collects all tool_use blocks, then partitions them. Read-only tools (like Read, Glob, Grep) run concurrently up to a configurable limit (default 10). Mutating tools (like Write, Edit, Bash) run serially. This is safe and predictable.
StreamingToolExecutor: Starts executing tools the moment they are parsed from the stream, before the API response is complete. A tool_use block for “Read file X” can begin executing while the model is still generating the next tool_use block for “Read file Y.” This reduces latency significantly for multi-tool turns.
The partitioning logic is the same in both paths: isConcurrencySafe is determined by calling tool.isConcurrencySafe() with the parsed input. For Bash, concurrency-safety typically tracks whether the command is read-only (cat, ls) versus mutating (rm, mv). The concurrency limit is configurable via CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY (default 10).
Smart Pattern: The
StreamingToolExecutortracks tool status through a lifecycle ('queued' | 'executing' | 'completed' | 'yielded') and maintains a child abort controller. If a Bash tool errors, the child controller fires to kill sibling subprocesses immediately — without aborting the parent query loop. A tool-level failure should not terminate the entire turn.
An Improved Design: Explicit State Machine
The current implementation works. It has shipped to millions of users. But the while(true) with seven continue sites and a transition field that is only set after the fact is a design that resists formal reasoning. You cannot draw a state diagram from the code without reading a very large orchestration function and mentally tracking every path.
Here is what an explicit typed state machine would look like:
Improved Explicit State Machine
Each state would be a discriminated union:
type QueryState =
| { kind: 'idle' }
| { kind: 'preparing_context'; messages: Message[] }
| { kind: 'streaming'; model: string; attempt: number }
| { kind: 'tool_execution'; pendingTools: ToolUseBlock[] }
| { kind: 'feeding_results'; toolResults: Message[] }
| { kind: 'evaluating_stop_hooks'; assistantMessages: AssistantMessage[] }
| { kind: 'checking_budget'; turnTokens: number }
| { kind: 'recovering_from_error'; error: RecoveryError; attempt: number }
| { kind: 'complete'; reason: TerminalReason }
Transitions would be explicit functions, not state = { ... }; continue. Each transition could be logged, tested, and visualized. The while(true) would become a switch on state.kind, and each case would return the next state.
The Smart Part: Generator-Based ask() Wrapper
One design decision in the query engine deserves particular praise. The ask() function is an AsyncGenerator that wraps QueryEngine.submitMessage(), which itself is an AsyncGenerator wrapping queryLoop().
This generator pipeline enables a single implementation to serve both streaming and batch consumption:
- REPL (interactive mode): The Ink UI iterates the generator with
for await, rendering eachStreamEventas it arrives. Text deltas update the screen in real-time. Tool results appear as they complete. - SDK / Headless mode: The same generator is consumed by
ask(), which constructs aQueryEngine, callssubmitMessage(), and yieldsSDKMessageobjects. SDK callers can iterate for streaming or collect into an array for batch. - Subagents: When Claude Code spawns a sub-agent (e.g., for background compaction), it calls the same
queryLoopwith different parameters. The generator contract is identical.
This is a textbook application of the generator pattern: decouple production from consumption. The query engine does not know or care whether its output is rendered to a terminal, collected into an array, or piped to another process.
The QueryEngine class itself is designed as one-instance-per-conversation. State — messages, file cache, usage totals, permission denials — persists across turns within the same engine instance. Each call to submitMessage() starts a new turn, clearing turn-scoped state while preserving conversation-scoped state.
What Should Change
The while(true) loop with seven continue sites is a liability at scale. Here is why:
Debugging is archaeology. When a user reports that Claude Code “got stuck in a loop,” the investigation requires reading the full queryLoop function, identifying which continue site fired, and mentally reconstructing the state at that point.
Testing is incomplete. You cannot write a unit test for “the loop enters reactive_compact_retry and then falls through to completed” without mocking the entire query infrastructure. An explicit state machine would let you test transitions in isolation.
New continue sites are risky. Every new recovery strategy means another state = { ... }; continue block, another 15 lines of boilerplate, another place where a typo in one of the nine state fields creates a subtle bug.
No formal verification. You cannot run a model checker on an implicit state machine. With an explicit one, you could verify properties like “reactive_compact_retry is attempted at most once” or “stop_hook_blocking never follows stop_hook_blocking” statically.
The transition field was the right first step. The next step is promoting it from a diagnostic annotation to the actual control flow mechanism.
Takeaway
Every AI agent needs a query loop. It is the fundamental abstraction: take a user message, call the model, detect tool use, execute tools, feed results back, repeat until done. Claude Code’s queryLoop is a mature, battle-tested implementation of this pattern, handling edge cases (token limits, context compaction, model fallback, stop hooks, budget tracking) that simpler implementations ignore.
But the lesson is clear: make the states explicit from day one. The while(true) pattern works when you have two or three continue sites. At seven — with context collapse, reactive compaction, output token escalation, and budget continuation all competing for control flow — you need a typed state machine.
This is Part 2 of the “Inside Claude Code” series.