/ 24 min read

Inside Claude Code: Building Something Better

Everything I learned from reverse-engineering Claude Code, distilled into a blueprint for building your own AI coding harness. Part 10 of 10.

From Autopsy to Architecture

We have spent 9 posts dissecting how Anthropic built Claude Code — roughly 512K lines of TS/JS across about 2,010 tracked files, with dozens of built-in capabilities. We have traced its boot sequence, mapped its query engine state machine, inventoried its tooling approach, followed tokens through its streaming pipeline, untangled its permission system, explored its MCP extensibility layer, analyzed its context management, rendered its terminal UI, and watched its multi-agent coordinator orchestrate parallel workers. We have seen brilliant patterns and painful anti-patterns.

Now it is time to answer the real question: if you were building this from scratch, what would you keep, what would you skip, and what would you do differently?

This post is the synthesis. No new source file analysis. Instead, I am distilling nine posts’ worth of architectural lessons into a concrete blueprint — a recommended architecture for building your own AI coding harness.

How Claude Code Solves It: What to Steal

Architecture Flow

See the diagram above for a visual overview of this flow.

Try the Interactive SimulationFull View →

Not everything in Claude Code is accidental complexity. Some patterns emerged because the team solved genuinely hard problems well. These six are worth adopting wholesale.

1. Streaming-First API Design (Post 4)

Claude Code does not bolt streaming onto a request/response model. Streaming is the model. The streamMessages() function treats every API interaction as a stream of server-sent events, with batch responses as a degenerate case (a stream that emits one chunk). This inversion matters because it means the UI can show partial results from the very first API call you implement. You never have to retrofit streaming later — a migration that, in practice, touches every layer of the stack.

Smart Pattern: Make AsyncGenerator<StreamEvent> your primary API return type. Add a collectAll() helper for tests and scripts that want the complete response. Never build a fetch().then(json) path that you will have to replace later.

2. Parallel Prefetch at Startup (Post 1)

Claude Code’s boot sequence launches authentication, feature flag fetching, MDM policy loading, and settings parsing in parallel using Promise.all(). This brings cold start from a potential 1-2 seconds down to under 200ms. The insight is that most startup tasks are I/O-bound and independent — there is no reason to serialize them.

Steal this: Identify every I/O operation your harness performs before it can accept input. Fire all independent ones simultaneously. Use Promise.allSettled() so a single failure does not block the entire boot.

3. buildTool() Factory with Secure Defaults (Post 3)

Every tool in Claude Code is constructed through a buildTool() factory function that enforces a consistent contract: input schema validation via Zod, permission checks before execution, structured output formatting, and error boundaries. A new tool contributor cannot accidentally skip permission checks because the factory will not let them.

Steal this: Create a single factory function that every tool must pass through. Bake in input validation, permission gates, timeout enforcement, and error wrapping. Make the secure path the easy path.

4. Generator-Based APIs (Post 2)

The query engine exposes its work as async generators, allowing callers to consume results incrementally. This pattern provides natural backpressure — if the UI cannot render fast enough, the generator simply pauses. It also enables dual-mode consumption: stream for interactive use, collect for programmatic use.

Steal this: Use async function* for any operation that produces incremental results. Generators compose naturally, they are memory-efficient, and they provide free cancellation (just stop iterating).

5. Transport-Agnostic Protocol Integration (Post 6)

MCP’s multi-transport design means Claude Code can integrate a local file-watching tool and a cloud-hosted database explorer through identical code paths. The transport is a deployment detail, not an architectural concern.

Steal this: Define your extension protocol in terms of messages, not transports. Let the same tool definition work over stdio, WebSockets, or HTTP.

6. Multi-Surface Architecture (Posts 1-9)

Claude Code runs as a CLI, a VS Code extension, an SDK, and an MCP server. The core logic — query engine, tool execution, streaming — is shared across all surfaces. Only the I/O layer changes.

Steal this: Separate your core logic from your I/O surface from day one. Define a HarnessCore interface that any surface (CLI, IDE plugin, web UI, API) can drive. The core should never import from a surface-specific module.

What to Skip

Claude Code also contains patterns that emerged from speed of development, not deliberate design. These are the six anti-patterns to actively avoid.

1. God Files (Posts 2, 4)

Claude Code has multiple large orchestration files in the multi-thousand-line range. Large files resist comprehension, make git blame less useful, create merge pressure on every PR, and defeat code review.

⚠️ Watch Out: No file over 1,000 lines. When a file approaches that limit, it is telling you it contains multiple responsibilities. Split it. The effort of splitting is always less than the ongoing cost of navigating a god file.

2. Implicit State Machines (Post 2)

The query engine manages transitions between states through a tangle of boolean flags and conditional branches rather than an explicit state machine. This makes it nearly impossible to enumerate all valid states or verify transition correctness.

Skip this: Use a discriminated union type for your state machine. TypeScript makes this ergonomic: type QueryState = { kind: 'idle' } | { kind: 'streaming', streamId: string } | { kind: 'toolCall', tool: ToolInvocation }. Invalid transitions become type errors.

3. Feature Flags as Architecture (Posts 1, 6)

Claude Code has dozens of feature gates controlling everything from UI rendering to core algorithmic behavior. When flags substitute for module boundaries, you end up with a combinatorial explosion of runtime configurations that no test matrix can cover.

Skip this: Use feature flags for gradual rollouts with defined end dates. Never use a flag to choose between two architectural paths.

4. Singleton Services (Post 2)

Global singletons for authentication, configuration, and telemetry create hidden coupling. Every module implicitly depends on global state, making testing require elaborate mocking setups.

Skip this: Pass dependencies explicitly. Create your services at the composition root and thread them through constructors.

5. Limited Test Infrastructure (All Posts)

At this scale, limited automated coverage on critical paths becomes expensive. Every refactor is a leap of faith, and contributors struggle to verify changes in isolation.

Skip this: Test-first from day one. Not 100% coverage — but the critical paths must have automated tests before you ship.

6. Multiple Extension Systems (Post 6)

Claude Code supports four different extension mechanisms. Each has its own discovery, lifecycle, and error handling. This fragments the ecosystem and confuses tool authors.

Skip this: Pick one extension protocol. MCP is the obvious choice. Make every extension an MCP server. Period.

The Blueprint

Here is the concrete architecture. Eight focused modules, each with a clear responsibility, explicit interfaces, and a maximum file size of 1,000 lines.

Claude Code’s Current Architecture

Claude Code — Current State
CLI Entry (bin/claude.ts)
Large Orchestration FilesMulti-thousand lines
Tool SystemDozens of tools
Streaming + APIMulti-responsibility
Permissions25 files, 3 systems
MCP + 3 Extension Systems
Context MgmtScattered
Terminal UI
Coordinator
Feature Flags (dozens)Controls behavior throughout all modules

Blueprint Target Architecture

Blueprint — Target Architecture
Core Modules
core/query-engine/~500 lines, explicit state machine
core/tool-system/~600 lines, factory + registries
core/streaming/~400 lines, client + processor
core/permissions/~300 lines, single module
core/context/~500 lines, assembly + cache
Integration
integrations/mcp/~400 lines, single protocol
Surface
ui/~500 lines, React/Ink
Coordination
coordination/~400 lines, multi-agent

query-engine → tool-system, streaming, permissions, context, coordination, ui  |  tool-system → mcp  |  coordination ↔ query-engine

Tech Stack (Proven Choices)

  • Runtime: TypeScript + Bun. Claude Code proves TypeScript is the right language for AI tooling — the type system catches integration errors at compile time, and Bun’s startup speed keeps the CLI responsive.
  • Terminal UI: React/Ink. Claude Code’s terminal UI is one of its genuine strengths. The component model maps naturally to streaming updates.
  • Testing: Vitest. Supports ESM natively, has first-class async generator support, and runs fast enough for watch mode during development.
  • Extension Protocol: MCP. One protocol, multiple transports, growing ecosystem. No alternatives needed.

Module Structure

1. core/query-engine/ — The Heart (~500 lines)

An explicit state machine using discriminated union types. Every state is named, every transition is a pure function, every invalid transition is a type error. This replaces Claude Code’s current multi-file implicit state tangle.

2. core/tool-system/ — Tool Contract + Factory (~600 lines)

A buildTool() factory that enforces input validation, permission checks, timeout enforcement, and structured output formatting. Every tool is an MCP-compatible server.

3. core/streaming/ — Separated Concerns (~400 lines)

Three distinct responsibilities: an HTTP client that manages connections, a stream processor that parses SSE events, and a token renderer that handles display. Each can be tested and replaced independently.

4. core/permissions/ — One Module, Not 25 Files (~300 lines)

A consolidated permission system with a single canExecute(action, context): PermissionResult interface. Replaces three overlapping permission systems with one.

5. core/context/ — Explicit Cache Lifecycle (~500 lines)

Context assembly with clear ownership of what gets cached, for how long, and when it gets evicted. No ambient global state.

6. integrations/mcp/ — Single Extension Protocol (~400 lines)

One extension system instead of four. Every external capability is an MCP server. Discovery, lifecycle management, and error handling in one place.

7. ui/ — React/Ink Components (~500 lines)

Terminal UI components with barrel exports. Each component handles one concern. No component exceeds 200 lines.

8. coordination/ — Explicit Multi-Agent Protocol (~400 lines)

A coordinator that uses an explicit protocol for task delegation, result collection, and conflict detection. File-level locking prevents edit conflicts.

Key Design Rules

  1. No file over 1,000 lines. Enforced by a lint rule.
  2. Every module has a public interface (types.ts) and internal implementation.
  3. Explicit state machines for all async workflows. No boolean flag soup.
  4. Test-first: every module has tests before implementation.
  5. One extension protocol (MCP), not four.

Complexity Comparison

Complexity Comparison
Claude Code
~512K lines of TS/JS
~2,010 tracked files
Largest files: 5,000+ lines
Dozens of tools
Dozens of feature flags
Limited test coverage
4 extension systems
~10x~13x~5x4 → 1
Blueprint MVP
~50K lines of code
~150 files
Largest file: under 1K lines
10-15 core tools + MCP
0 feature flags (use modules)
80%+ critical path coverage
1 extension system (MCP)
DimensionClaude CodeBlueprint
Total LOC~512K TS/JS~50K (MVP)
Files~2,010 tracked files~150
Largest fileMulti-thousand-line orchestration filesUnder 1,000 lines
ToolsDozens of built-in capabilities10-15 core + MCP extensions
Feature flagsDozensMinimal, short-lived rollout flags
Test coverageLimited critical-path automation80%+ critical paths
Extension systems4 (MCP, slash, commander, custom)1 (MCP)
State managementImplicit (boolean flags)Explicit (discriminated unions)
Permission systems3 overlapping1 consolidated
Boot timeUnder 200ms (parallel)Under 200ms (parallel, same pattern)

The 10x reduction in code is not because the blueprint does less. It is because the blueprint does not repeat itself.

Getting Started: The First Five Files

Do not try to build all eight modules at once. Start with these five files, in this order:

Minimum Viable Harness — First Five Files
1.
core/query-engine/types.tsQueryState union type (~80 lines)
defines states for
2.
core/query-engine/engine.tsState machine transitions (~500 lines)
invokes tools via
3.
core/tool-system/tool.tsbuildTool() factory (~300 lines)
streams responses via / checks permissions via
4.
core/streaming/client.tsStreaming HTTP client (~200 lines)
5.
core/permissions/permissions.tsPermission evaluation (~200 lines)

File 1: core/query-engine/types.ts (~80 lines) — Define your state machine’s vocabulary. Every state the query engine can be in, every event that can trigger a transition. Use discriminated unions so TypeScript enforces valid transitions at compile time.

File 2: core/query-engine/engine.ts (~500 lines) — The state machine itself. A pure function transition(state: QueryState, event: QueryEvent): [QueryState, QueryEffect[]]. No I/O in this file — all I/O happens in the effect handlers.

File 3: core/tool-system/tool.ts (~300 lines) — The buildTool() factory. Takes a tool definition and wraps it with input validation (Zod), permission checks, timeout enforcement, error boundaries, and structured output formatting.

File 4: core/streaming/client.ts (~200 lines) — A streaming HTTP client purpose-built for the Anthropic Messages API. Turns an HTTP connection into an AsyncGenerator<StreamEvent>.

File 5: core/permissions/permissions.ts (~200 lines) — A single canExecute(action: Action, context: PermissionContext): PermissionResult function. Returns a discriminated union: { allowed: true } or { allowed: false, reason: string, canPrompt: boolean }.

Suggested Build Timeline

Build Timeline
Week 1
Foundation
QueryState types (2d)
Query engine + tests (3d)
Tool factory + tests (3d)
Week 2
Pipeline
Streaming client + tests (3d)
Permission module + tests (3d)
Integration testing (2d)
Week 3
Extensions
Context assembly + tests (3d)
MCP integration + tests (3d)
End-to-end testing (2d)
Week 4
Surface
React/Ink UI (3d)
Coordinator protocol (3d)
CLI entry + polish (2d)

Week 1 builds the foundation: the state machine and the tool system. Week 2 adds the pipeline: streaming and permissions. By end of week 2, you have a working harness that can accept a prompt, stream a response, execute tools with permission checks, and return a result. Week 3 adds context management and MCP extensibility. Week 4 adds the user-facing surface and coordination layer.

Series Wrap-Up

Over ten posts, we have taken Claude Code apart and put it back together. Here is what I learned.

The architecture is a product of its velocity. Claude Code was built fast, by a small team, under pressure to ship. Many of its anti-patterns — god files, implicit state machines, limited tests — are the predictable result of prioritizing speed over structure. That is not a criticism; it is a trade-off that made sense at the time.

The good patterns are genuinely good. Streaming-first design, parallel boot, the tool factory pattern, generator-based APIs, MCP integration, multi-surface architecture — these are not obvious choices. They represent hard-won engineering judgment about how AI coding tools should work. Any team building in this space should study them.

Complexity is the real enemy. Not lines of code, not file count, but accidental complexity — the complexity that does not serve the user. Overlapping permission pathways instead of one clear policy layer. Multiple extension mechanisms instead of one cohesive model. Multi-thousand-line orchestration files instead of focused modules.

The blueprint is achievable. A focused team can build a functional AI coding harness in four weeks with the architecture described above. It will not have every built-in capability, every edge case, or every enterprise scenario. But it will have clean module boundaries, explicit state management, comprehensive test coverage, and a single extension protocol. That foundation is easier to extend than a very large legacy codebase is to refactor.

The best AI coding harness is not the one with the most features. It is the one with the cleanest architecture — because clean architecture is what lets you add features without breaking everything else. Claude Code taught us what to build. Now go build it better.


The Full Series

  1. What Happens When You Type ‘claude’? — Boot sequence
  2. The Brain: How an AI Agent Thinks in a Loop — Query engine
  3. Giving AI Hands: The Tool System — Tools
  4. Streaming: Why Your AI Should Talk While It Thinks — Streaming
  5. Who’s Allowed to Do What? — Permissions
  6. Plugging Into Everything with MCP — MCP
  7. Remembering What Matters — Context and memory
  8. React in Your Terminal — Terminal UI
  9. When One Agent Isn’t Enough — Multi-agent
  10. Building Something Better: A Blueprint — You are here

This is Part 10 of the “Inside Claude Code” series.

← Part 9: When One Agent Isn’t Enough