Skip to main content
EMil Wu

#15

Practical Tips 3: The Agent Reading Trap — Summaries Speed You Up, and Make You Miss

Practical 8 min read
Agent reading a long document with some sections blurred out from summarization Agent reading a long document with some sections blurred out from summarization
Did your Agent actually read everything? Summaries make you fast, and make you miss

Last time we talked about packaging Skills as Plugins. This article tackles something more fundamental that many people don’t realize: did your Agent actually read everything you gave it?

A few days ago, I extracted the /insights command [1] from Claude Code’s leaked source code [3] — 3,200 lines of TypeScript. During the extraction, I hit a classic pitfall: the same file, read three different ways, produced three completely different results.


How Do Agents Read Things?

Here’s a fact most people don’t know: when Agents read large content, they default to summarization.

Whether it’s a web page or a file, when content exceeds a certain length, the Agent doesn’t read word by word. It splits the content into sections, summarizes each chunk, then stitches the summaries together as “done reading.” Even when you explicitly say “read it completely,” the Agent often still uses subagents to summarize smaller blocks — just at a finer granularity.

This isn’t unique to one tool. Cursor’s Agent reads only the first 250 lines of a file by default [7]. Claude Code’s Read tool caps at 2,000 lines. WebFetch uses a smaller model (Haiku) to summarize fetched pages before handing them to the main model [8] — you never get the raw content.

This behavior is perfectly reasonable in most scenarios. When you have an Agent read a blog post, scan a document, or Web Fetch the key points from a page, summaries work fine.

But when your goal is to have the Agent “build” something based on what it read, this behavior becomes an invisible trap.


Three Reading Depths, Three Results

While extracting /insights, I tried three different instructions on the same 3,200-line insights.ts:

First: “Read this file”

The most natural instruction. The Agent quickly scans the entire file and produces a structured summary: what the file roughly does, its main functions, the overall flow.

Result: The extracted logic was unusable. Key prompt content was skipped, dependencies between functions were simplified, and many important conditional checks simply disappeared.

Second: “Force a complete read”

Explicitly telling the Agent: “Read section by section, don’t skip anything.” The Agent processes each section more carefully, but still compresses details to some degree.

Result: The extracted logic worked, but the generated reports felt off. Format correct, structure correct, but precision was noticeably lower — some analysis dimensions got mixed together, subtle differences between prompts got flattened.

Third: “No summaries, read every single line”

The most extreme instruction. Token usage skyrocketed, but the Agent finally saw every line of code.

Result: The extracted logic matched the original. All subtle prompt differences were preserved, function dependencies were complete, conditional checks were all there.

Three approaches, vastly different token costs, and night-and-day differences in output quality.


Why Code Is Especially Vulnerable to Summary Damage

Natural language has redundancy — the same meaning can be expressed in different sentences, and skipping a few doesn’t hurt comprehension. But code is extremely dense — every line has a specific purpose, and skipping one line might mean skipping a critical edge condition.

Factory.ai’s research on context compression [4] found that generic summarization systematically discards information agents need — file paths, technical specifics, and decision rationale all vanish as “low-entropy content,” with artifact tracking scoring only 2.19-2.45 out of 5.0. Chroma’s study across 18 frontier models [5] showed that in a typical 20,000-token context, truly relevant code accounts for just ~2.5% — when summarization compresses that 2.5% further, quality drops off a cliff.

The /insights source code has a perfect example: when reading session conversation logs for analysis, it uses multiple different prompts. At first glance, these prompts look similar — they all ask the LLM to analyze conversation content. But look closely, and each has subtle differences: one emphasizes friction point categorization, another emphasizes usage pattern extraction, another focuses on recommendation feasibility.

When the Agent reads with summarization, these prompts get reduced to “multiple analysis prompts” — the subtle differences get averaged out. The result is “type two” from above: reports come out, but quality drops, because those flattened differences are exactly why each prompt exists.

Think about it from the other direction — if these prompts didn’t need subtle differences, why would Anthropic’s engineers split them up? They could’ve used one prompt with parameters. The deliberate splitting means each variation carries meaning. Summarization erased that meaning.


What We Learn from Anthropic’s Design

Dissecting how a company that builds AI Runtimes and AI Models designs its own AI tools is an incredibly valuable learning opportunity.

What they did in /insights is essentially a demonstration of Context Engineering in practice — different analysis tasks get different prompts, each precisely steering the LLM’s attention, rather than one catch-all prompt trying to handle everything.

This aligns with what we discussed in article three on Context Engineering [2]: Context isn’t about quantity, it’s about precision. Anthropic practices what they preach.


When Summaries Are Fine, When Full Reading Is Essential

This isn’t to say summarized reading is bad. It’s the optimal strategy in many scenarios:

Summarized reading works for:

  • Getting a quick sense of direction (“What does this repo roughly do?”)
  • Web Fetch to grab key points from pages
  • Initial assessment of whether a document is relevant
  • Scanning changelogs for specific version changes

Full reading is necessary when:

  • Building something based on what was read (writing code, extracting logic, restructuring architecture)
  • Source code is dense and subtle differences are meaningful
  • Multiple similar-but-different config files or prompts
  • Previous summarized reading produced output that “feels off” — this is usually a signal

Practical rule of thumb: if you need the Agent to “understand,” summaries are enough. If you need the Agent to “reproduce” or “build,” you need full reading.


When You Notice the Agent Missed Something

One last practical observation: when you find the Agent’s output is missing something — a conditional check, an edge case, a distinction between two similar-but-different concepts — the odds are high that it’s not the Agent’s reasoning ability at fault, but that it lost detail during the knowledge-building phase due to summarization.

Claude Code GitHub Issue #7533 [6] documents a textbook case: after 2-3 context compactions, the Agent stopped reading files completely, switching to grep and partial reads to piece together information, then filling gaps with “pattern-matching assumptions.” The developer’s conclusion: “Better to have correct edits and run out of context than to make incorrect edits while preserving context.”

Don’t rush to debug the output. Look back at what it read and how it read it. Often, the problem isn’t in the downstream reasoning — it’s in the upstream input.

When the material is small, you can fix things through post-hoc debugging — just patch whatever the Agent missed. But when the material is large — like 3,200 lines of source code — the cost of fixing after the fact is far higher than getting the reading right from the start.

Maybe this is the most easily overlooked aspect of working with Agents: we spend so much time calibrating the Agent’s output, but rarely go back to examine its input. And the quality of input is often already determined the moment it “reads.”


References:

[1] Extracting Claude Code /insights from the Leaked Source — The real-world extraction case /en/resources/claude-code-source-insights

[2] Article 3: Context Engineering — Context isn’t about quantity, it’s about precision /en/articles/3-context-engineering

[3] News: Claude Code’s Full Source Code Leaked — Leak coverage /en/news/claude-code-source-leak

[4] Evaluating Context Compression for AI Agents — Factory.ai — How generic summarization systematically discards what agents need https://factory.ai/news/evaluating-compression

[5] Context Rot — Chroma Research — Context degradation across 18 frontier models https://www.trychroma.com/research/context-rot

[6] Claude Code Issue #7533 — Context preservation vs correctness — Real-world case of summarized reading causing incorrect edits https://github.com/anthropics/claude-code/issues/7533

[7] Dynamic Context Discovery — Cursor Blog — Why truncating tool responses causes data loss https://cursor.com/blog/dynamic-context-discovery

[8] Inside Claude Code’s Web Tools — Mikhail Shilkov — WebFetch never returns raw content https://mikhail.io/2025/10/claude-code-web-tools/

Support This Series

If these articles have been helpful, consider buying me a coffee