Agent Team in Practice (6): Knowledge — Em's Wiki and 'Not Everything Belongs in Context'

Contents

Why We Need a Knowledge Agent
raw → wiki → inbox: Three Stages of Knowledge
Three Ingest Stories
Case 1: Ingesting an External Plugin Repo
Case 2: Ingesting 10 Conversation Logs from the Build Process
Case 3: cross-concept synthesis
What Happens When the Wiki Gets Big?
Knowledge Is More Than Storage
The Complete Knowledge Pipeline

Em organizing scattered notes into a structured bookshelf in a library — raw data in, structured knowledge out — Em is the Agent Team's knowledge compiler

In one line: An Agent Team’s knowledge shouldn’t be scattered across each Agent’s Context — it needs a centralized, compilable, searchable knowledge base.

One evening I had Em ingest 10 conversation logs — records of my interactions with different Agents throughout the process of building the entire Agent Team, including how I coordinated Em and C7 for parallel processing, how I discussed GM’s design with Agent G, and how I manually handed off one Agent’s output to another. After the ingest completed, Em’s wiki gained 3 brand-new concepts (bootstrap-agent-pattern, prompt-relay-pattern, cross-machine-registry-pattern), updated 3 entity pages and 2 existing concepts, and produced a source summary containing a complete timeline of all 10 conversations, 5 human coordination patterns, 10 architectural decisions, and “7 patterns GM needs to learn.”

That was when I first felt like an Agent had “remembered” the things we’d done together. That feeling didn’t come from a larger Context window (around that time Opus models’ Context Windows grew from 200K to 1M tokens) — it came from knowledge starting to get structured. Decisions and patterns that had been scattered across different session conversation histories, after Em’s ingest, became wiki pages that any Agent could query at any point in time. And the pages cross-referenced each other — Em didn’t just store knowledge, it built the connections between pieces of knowledge. That gave Context a new dimension.

Why We Need a Knowledge Agent

In earlier articles I mentioned that building the Agent Team consumed 1,080,005 output tokens — roughly equivalent to 800 pages of text — and 97% of that output was Markdown. That Markdown was scattered across different conversation logs, different design documents, and different Agents’ working directories. Without someone systematically organizing it all, that knowledge was just text sitting somewhere — not knowledge that could actually be used. That’s exactly why we designed Em. For now, Em doesn’t do anything else. It only does three things: ingest (converting raw materials into structured wiki pages), query (finding relevant knowledge in the wiki to answer questions), and lint (checking the wiki for consistency and completeness). It serves as the entire Agent Team’s knowledge compiler [2][7] — raw data in, structured knowledge out.

In the Context Engineering article I mentioned the principle of JIT loading [1]: rather than stuffing everything into Context and waiting for it to be needed, you load what you need when you need it [5]. Em’s wiki is the concrete embodiment of that principle. Agents don’t need to remember all knowledge — they just need to know to “go ask Em,” and Em will find the relevant pages, knowledge, and indexes from the wiki and provide them.

raw → wiki → inbox: Three Stages of Knowledge

Em’s knowledge pipeline has three stages, each handling a different kind of data.

raw/: Where raw materials live, untouched. This might be a conversation log, an external document, the contents of a GitHub repo, or even a scraped DeepWiki page. Em never modifies anything in raw/ — it only extracts knowledge from it.

wiki/: Em’s core output — the structured knowledge base. It contains four types of pages: concepts (ideas like Context Engineering or the three-layer methodology framework), entities (people, things, tools — like each Agent, external tools, or the author), sources (source summaries documenting the raw materials and outputs from each ingest), and analysis (cross-concept synthesis, like methodology recommendations for GM). Every page follows a fixed structure (title, type, date, source tags), and pages are connected via cross-references [8].

inbox/: Pending messages from other Agents. When C7 completes a skill entry, or Dm finishes building a new Agent, they can drop a notification into Em’s inbox telling it “there’s something new worth ingesting into the wiki.” This mechanism is still manual for the most part (usually I’m the one telling Em “go ingest this”), but by design it will eventually be automated and coordinated by GM.

Each /ingest run follows this flow:

Read the source documents from raw/
Create a source summary page in wiki/sources/
Identify the entities and concepts mentioned in the source
Create new pages or update existing ones
Update wiki/index.md (the index of all pages)
If the source shifts the big-picture understanding, update wiki/overview.md
Record this ingest in wiki/log.md

flowchart TD A["1. Read raw/ source documents"] --> B["2. Create source summary"] B --> C["3. Identify entities and concepts"] C --> D["4. Create new pages / update existing"] D --> E["5. Update wiki/index.md"] E --> F["6. Update wiki/overview.md"] F --> G["7. Record in wiki/log.md"] style C fill:#c67a50,color:#fff style D fill:#c67a50,color:#fff

Em's 7-step /ingest flow — steps 3 and 4 (orange) require the most judgment: is this a new concept or an addition to an existing one?

It looks mechanical, but steps 3 and 4 are where the most judgment is required. Em has to decide: is this piece of knowledge a brand-new concept, or does it supplement an existing one? If it’s a supplement, which page and which section gets updated? If multiple concepts are involved, how should the cross-references be structured? These judgments determine the quality of the wiki, and the quality of the wiki determines whether other Agents get useful results when they query it.

Three Ingest Stories

Scattered knowledge cards connected by golden threads forming new insights — cross-concept synthesis — connecting knowledge scattered across different pages to produce insights no single page can provide

Describing the flow abstractly only gets you so far. Let me open it up with three real ingest cases.

Case 1: Ingesting an External Plugin Repo

The first was ingesting my own Claude Code Plugin repo (emilwu-plugins), which contains three plugins already in production use: investigate (parallel cross-repo investigation), claude-insights-command (session analysis reports), and claude-insights-skill (same thing but in Skill form).

The instruction I gave Em wasn’t just “ingest this repo” — I attached a detailed extraction focus, telling it “there are three patterns worth extracting here”: the Parallel Investigation Pattern (Explore → Decide → Execute three phases), the Session Analysis Pipeline (a five-stage data pipeline with two-layer parallel subagent processing), and the Plugin Distribution Pattern (the packaging structure of hooks + scripts + prompts).

After Em read 16 files, it produced 3 new pages (a source, an entity, and a new concept: multi-phase-llm-pipeline) and updated 4 existing pages (subagent-pattern gained two real-world cases, plugin-packaging got distribution details filled in).

The key design decision here was giving Em the extraction direction inside the ingest instruction itself, rather than letting it judge on its own what was worth extracting. Em’s Context doesn’t include “how long this Plugin has been running in production” or “which patterns have been reused by other projects.” Without that direction, it might focus on the wrong things. This is the same judgment I discussed in the calibration article about “what to spell out vs. what to let go”: spell out the goals and constraints, let go of the execution path.

Case 2: Ingesting 10 Conversation Logs from the Build Process

The second case is the one I described at the start — ingesting 10 conversation logs from my interactions with different Agents while building the Agent Team. This ingest had a prologue: the first time I gave the instruction, Em started following the standard flow and reading files, but I interrupted it and gave a richer instruction instead.

I interrupted because I realized the standard flow wasn’t enough. These conversation logs weren’t “knowledge documents” — they were “records of work in progress.” Em needed to extract patterns from the work process, not store the conversation content. So in the new instruction I wrote: “This is the process of how I built the entire Agent Team. Extract the knowledge embedded in the process — look at how I coordinated these Agents to get things done, because that’s what Agent GM needs to learn. The communication patterns between these Agents can also serve as a basis for designing the inbox and communication mechanisms.”

What made that instruction matter: ingest quality depends not just on Em’s ability, but also on the direction the human provides. If you just hand Em a pile of files and say “ingest,” it will use its own judgment to decide what’s important — but it doesn’t have the implicit context in your head, like “the coordination patterns in these conversations are things GM will need to learn later.” Without that, the extraction result might not go in the direction you want. That said, be careful not to give too much detail — you don’t want the Agent trying to piece together something from the conversation that simply isn’t there.

In the end, Em extracted from 10 conversations: bootstrap-agent-pattern (the pattern of Agent G as a temporary builder), prompt-relay-pattern (the current state of humans manually passing prompts between Agents — exactly what GM needs to automate), cross-machine-registry-pattern (Agent registration and sync across machines), and a source summary organizing “5 patterns of Human as Coordinator.” All of it laid the foundation for the Agent Team’s continued development.

Case 3: cross-concept synthesis

The third case was a hybrid application — it wasn’t just having Em do an ingest, but having it perform a cross-concept synthesis.

This was during GM’s design phase, when Dm’s Architect needed a GM Methodology recommendation to work from. But GM’s Methodology couldn’t be derived from a single source — it required cross-referencing multiple concepts already in the wiki: agent-team-architecture explained GM’s position in the architecture, context-flow-decision-model explained when GM should use a Skill vs. a Subagent vs. Agent Teams, methodology-playbook-plan described what a Methodology itself should look like, completion-reporting-protocol explained how Agents report back after completing work, and prompt-relay-pattern cataloged what humans are currently doing manually.

I had Em read 12 concept pages, 5 entity pages, and 2 source summaries, then produce a synthesis analysis of GM’s methodology. It contained 7 methodology principles (each traceable back to a specific concept in the wiki), scenario pattern recommendations for GM (patterns distilled from 16 use cases), and Context design recommendations (what belongs in CLAUDE.md, which knowledge should only be loaded on demand).

This is Em’s most valuable capability: not just storing knowledge, but connecting knowledge scattered across different pages to produce insights that no single page can provide. After Dm’s Architect read Em’s synthesis, combined with the 16 scenarios in the design inputs, it went straight to designing GM’s plan — without needing to come back and ask Em “what are the specs for that protocol?” because the synthesis had already integrated all the relevant knowledge together.

What Happens When the Wiki Gets Big?

At this point, Em’s wiki sounds pretty great: knowledge is structured, cross-referenced, capable of cross-concept synthesis, queryable by other Agents at any time. But as the wiki grew from 20-something pages to 50-something pages and then to 120+ files, a problem started to surface.

Every /query requires Em to first read index.md to find the relevant pages, then read those pages, synthesize, and answer the question. But as the wiki keeps growing [4], index.md itself gets very long, and Em — in order not to miss anything — often reads more pages than actually needed, because it can’t tell from the title alone whether a given page is relevant to the question.

This is a linear token cost scaling problem [3][10] — the larger the wiki, the more tokens get read per query, and most of what gets read ends up unused. This is actually the flip side of the Context Stuffing problem discussed in the Context Engineering article: you’re not loading everything into Context at startup (that’s eager loading), you’re being forced to read too much at query time (that’s a search problem).

More specifically, the issue is that Em’s query workflow is missing a search layer. The current flow is: read index → read pages → synthesize answer. The truly ideal flow should be: use semantic search to find the most relevant pages → read only those pages → synthesize answer. The difference is whether the first step is “traversal” or “search.” Traversal cost grows proportionally with wiki size; search cost is (in theory) constant.

flowchart LR subgraph current["Current: Traversal"] direction TB Q1["Query"] --> I1["Read index.md — 120+ entries"] I1 --> P1["Read multiple pages — most unused"] P1 --> A1["Synthesize answer"] end subgraph ideal["Ideal: Search"] direction TB Q2["Query"] --> S2["Semantic search — QMD vector matching"] S2 --> P2["Read only relevant pages — 2-3 pages"] P2 --> A2["Synthesize answer"] end style P1 fill:#c67a50,color:#fff style P2 fill:#6b8f71,color:#fff

Two wiki query paths — traversal cost scales linearly with wiki size; search cost is theoretically constant

Eventually I introduced QMD (a semantic query tool based on vector search [6]), letting Em’s queries first use semantic search to narrow the scope before reading the relevant pages. But the QMD integration story is more complex — it involves a three-phase design — so I’ll save that for the next article on the memory system.

Knowledge Is More Than Storage

A living bookshelf where books are writing themselves and branches grow out from between the pages — The wiki isn't just a database — it's a growth record for every Agent

Coming back to Em: during the build process, one small moment stuck with me.

After the CLAUDE.md refactor mentioned in the previous article (dropping from 99 lines to 47), I had Em update its own wiki entity page to document the refactor. That session lasted only 4 minutes. Em updated wiki/entities/agent-em.md, added a description of the wiki-schema skill in the Configuration section, then recorded in wiki/log.md the rationale for “extracting Page Format / Workflows / Conventions into a skill.”

An Agent recording its own architectural changes in its own knowledge base — it sounds very meta, but it demonstrates something important: the wiki isn’t just “a database for other Agents to query.” It’s also every Agent’s own growth record. The next time someone asks “when did Em do a refactor and why,” the answer is right there in Em’s own wiki, without needing to dig through conversation history.

This surfaces another observation. In traditional software development, knowledge management is usually an afterthought: write the code first, then document it (if there’s time). But in an Agent Team, knowledge management is a core part of the workflow — because an Agent’s capability is directly determined by what knowledge it can access and what Context it has. If a concept isn’t in Em’s wiki, other Agents won’t know it exists and won’t factor it into their design decisions [11]. So calling Em the brain and the source of the entire Agent Team isn’t really an overstatement.

97% Markdown, 1,080,005 output tokens, 800 pages of content — the value of that output isn’t in “how much was written,” but in “how much was structured.” Scattered conversation logs are raw data. Wiki pages that have been ingested by Em are knowledge. And knowledge’s value lies in the fact that it can be found, cited, cross-linked, and used as a basis for decisions by other Agents when they need it.

The Complete Knowledge Pipeline

If you were to draw the full knowledge pipeline, it would look something like this:

flowchart LR subgraph raw["Raw Materials"] R1["Conversation logs"] R2["External documents"] R3["GitHub repos"] R4["Agent outputs"] end subgraph ingest["Em Ingest"] direction TB EM1["Identify concepts"] EM2["Build cross-references"] EM3["Update index"] end subgraph wiki["Wiki Pages"] direction TB W1["concepts"] W2["entities"] W3["sources"] W4["analysis"] end QMD["QMD Index"] subgraph agents["Agent JIT Query"] direction TB AG1["Dm designing new Agent"] AG2["GM coordinating decisions"] end raw --> ingest ingest --> wiki wiki --> QMD QMD --> agents style ingest fill:#6b8f71,color:#fff style QMD fill:#58a6ff,color:#fff

The knowledge pipeline: every step is compression and structuring of Context — raw data in, JIT knowledge out

Every step is compression and structuring of Context: raw data contains a lot of noise and repetition, Em filters out the noise and compresses it into concept pages, cross-references between concept pages turn knowledge into a network, the QMD index means queries don’t need to traverse the entire wiki, and other Agents load the knowledge they need only when they need it.

This is the practical implementation of Context Engineering’s JIT loading principle at the knowledge management level [9]: knowledge shouldn’t live in every Agent’s Context. Knowledge should live in a centralized, compilable, searchable knowledge base. Agents query it when they need it, and once they have what they need they can release that portion of Context — making room for the next step of work.

Maybe the most underestimated role in an Agent Team isn’t GM (everyone thinks the leader is most important) — it’s Em (nobody thinks the librarian matters, until they can’t find the book they need).

Maybe that also explains why Em was the first Agent to be built: because before any of the other Agents could start doing their work, there needed to be somewhere to store — somewhere to know — “what we already know.”

References

[1] Anthropic — Effective Context Engineering for AI Agents (official guide: selective loading over context stuffing) https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

[2] The New Stack — 6 Agentic Knowledge Base Patterns Emerging in the Wild https://thenewstack.io/agentic-knowledge-base-patterns/

[3] arXiv — Solving Context Window Overflow in AI Agents https://arxiv.org/html/2511.22729v1

[4] AI Pace — Context Engineering: Mitigating Context Rot in AI Systems (“context rot”: the larger the context, the lower model reliability) https://medium.com/ai-pace/context-engineering-mitigating-context-rot-in-ai-systems-21eb2c43dd18

[5] Jentic — Just-In-Time-Tooling: Scalable, Capable and Reliable Agents (JIT architecture: load on demand rather than preload) https://jentic.com/blog/just-in-time-tooling

[6] arXiv — Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (counterpoint: industry leans toward RAG + embedding over structured wiki) https://arxiv.org/abs/2501.09136

[7] InfoWorld — Anatomy of an AI Agent Knowledge Base https://www.infoworld.com/article/4091400/anatomy-of-an-ai-agent-knowledge-base.html

[8] arXiv — MAGMA: A Multi-Graph based Agentic Memory Architecture (counterpoint: graph structure may be more effective than flat wiki) https://arxiv.org/pdf/2601.03236

[9] Context Studios — From Mode Collapse to Context Engineering (“focused 300-token context outperforms unfocused 113K-token context”) https://www.contextstudios.ai/blog/from-mode-collapse-to-context-engineering-how-we-build-reliable-ai-systems-2026

[10] Factory.ai — The Context Window Problem: Scaling Agents Beyond Token Limits https://factory.ai/news/context-window-problem

[11] Meta Engineering — How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines https://engineering.fb.com/2026/04/06/developer-tools/how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines/

Support This Series

If these articles have been helpful, consider buying me a coffee

☕ Buy me a coffee