Golden Circle 2: Building Methodology — Three Cores and Their Vulnerabilities

Contents

The Three Cores of the Methodology
Core 1: Multi-Strategy Discovery
Core 2: Fact-Based Signal Sources
Core 3: Self-Evolving Metrics
Looking Back for Vulnerabilities — Methodology Isn’t Perfect
When Should Humans Step In?

Three interlocking loops forming a self-growing methodology — The three cores of methodology: multi-strategy discovery, fact-based signal sources, self-evolving metrics

In the previous article, we built the three-loop closure architecture that lets the system break through ceilings and keep growing. But architecture is just the skeleton — borrowing Mosky’s analogy, even the Attack on Titan’s titans grow the skeleton first, then the flesh. So starting with this article, we’re putting meat on the bones: what are the three cores of the methodology? What are its vulnerabilities? And when should humans step in?

The Three Cores of the Methodology

After the Agent organized the approach into a methodology, we started discussing what its core really was. After several rounds of back-and-forth, the Agent helped me converge on three interdependent cores:

Core 1: Multi-Strategy Discovery

Because relying solely on “detecting known issues” always has a ceiling, the Discovery Loop needs to grow on its own. That “growth” requires multiple strategies to avoid being constrained, with each strategy following a different growth curve:

Pattern Propagation (linear): When a problem is found in one Agent, check whether other Agents have the same issue
Mutation Probing (linear): Create variations of known issues to see if the system misses the variants
Root Cause Inference (exponential): Derive entirely new problem categories from a single root cause — this is the key to breaking through the ceiling, because one root cause can map to multiple problems you never imagined
Cross-Domain Transfer (combinatorial): Apply problem patterns from one domain to another

Core 2: Fact-Based Signal Sources

All judgments in the methodology must be grounded in facts (Source of Truth), not assumptions. So we split signals into three layers: static analysis (examining the code), state analysis (examining the runtime database state), and production signals (observing actual runtime behavior). Only multi-layer signal cross-validation can produce conclusions you can have confidence in.

This directly connects to what we discussed in Article 3: Context Engineering [5] — the timezone blind spot from before was a textbook case of an Agent operating on assumptions instead of facts.

Core 3: Self-Evolving Metrics

This is the most important one, because without metrics you can’t distinguish priority from importance. But metrics themselves can become constraints — when you optimize one metric, you might unknowingly regress on other dimensions.

So metrics need a mechanism to be “challenged by facts”: when facts contradict a metric, the problem isn’t the facts — it’s the metric that needs adjusting. At the same time, every metric should have a corresponding “inverse/positive shadow metric” that monitors whether optimizing the primary metric is causing harm elsewhere.

For example, if you’re optimizing “detection rate” and “false positive rate” climbs simultaneously, the shadow metric tells you that you might just be manufacturing more noise rather than genuinely improving detection.

Looking Back for Vulnerabilities — Methodology Isn’t Perfect

Sounds pretty complete at this point, right? Three interlocking loops, three cores — feels like everything’s been accounted for.

But I went one step further: I asked the Agent to review the framework one more time, using three sub-agents to examine the entire structure — three loops, three cores — for vulnerabilities.

The Agent identified seven structural weaknesses. A few are particularly worth highlighting:

Self-Referential Loop: All discoveries originate from known issues. If the known issues themselves are biased, the entire system grows in the wrong direction.
Resolution Verification Gap: “Check passed” doesn’t mean “problem solved.” An issue marked as fixed might have just been worked around.
Unobservable Unknowns: Some problems leave no trace at any signal layer. No matter how sophisticated the system, it can’t see them.
Cost Blindness: The system keeps adding checks, but there’s no mechanism to evaluate the ROI of each check.

These vulnerabilities are themselves part of the methodology, because knowing where your weaknesses are is what lets you design the corresponding safeguards. Some vulnerabilities can be automatically remediated by the system (e.g., resolution verification can use automated regression testing). Others require human intervention (e.g., cost decisions and strategic direction) — which naturally leads to designing human intervention points.

And methodology isn’t built in a vacuum. I had the Agent search academic research, and it found 33 relevant sources. Some supported the direction of this methodology, such as the MAPE-K self-adaptive loop [1], Google SRE’s error budget model [2], and Netflix’s Chaos Engineering. These all map cleanly onto the core structure we were building.

But there were also counter-arguments that directly challenged it: Model Collapse risk, meaning self-referential systems may gradually degrade, maps to the vulnerability of self-referential loops; Goodhart’s Law [3], meaning a measure stops being a good measure once it becomes a target, maps to the necessity of shadow metrics. These counter-arguments aren’t meant to demolish the methodology. They tell us exactly where safeguards need to be installed.

When Should Humans Step In?

But not every problem should be tossed to humans — otherwise it’s humans working for the Agent. So next, I borrowed the familiar 80/20 rule (a loose application, since that’s not its original intent) to draw a first rough boundary. I told the Agent that I expected the system to self-operate 80% of the time, with humans stepping in for the critical 20%, because in practice, the things that truly need human judgment might really only be about 20%.

Starting from this concept, I had the Agent revisit the vulnerabilities it had found, and it realized some problems could be self-remediated — specifically those where all the necessary information exists within the system’s reachable signal sources. Examples: resolution verification (automated regression testing), temporal blind spots (adding time-dimension scans), check interaction effects (tracking causal chains), and so on.

It also recognized that the problems truly requiring human intervention are those needing external knowledge or business judgment — severity assessment requires understanding business context, cost trade-offs require strategic-level judgment, and signal source investment requires evaluating whether it’s worth spending resources to build new observation channels.

After final estimation: once the system matures, humans would only need roughly one to two hours per month to handle these intervention points, with the system running autonomously the rest of the time. (I think this estimate is overly optimistic. My instinct is closer to one hour per week, assuming model capability doesn’t make a major leap. If models get much stronger, then maybe it really will be as the Agent predicted.)

But here’s the important caveat: 80/20 shouldn’t be the target from the start — it should be the result of gradual evolution. In the early stages, I envision starting at 50/50 — the system runs half, humans verify half — while building more reliable judgment mechanisms in parallel (such as having multiple different models cross-vote on critical decisions, along the lines of the MAGI concept). After a period of time, once these mechanisms have gone through eight or more cycles with verified accuracy above 90%, you can start gradually reducing human involvement. If you force an 80/20 split from the beginning, you’re letting go of the reins before the system has earned your trust.

I also made a key design decision here: when the system detects an anomaly, it should not immediately escalate to humans — it should first investigate on its own. Only when the investigation reveals it can’t handle the situation, meaning the required information isn’t within the system’s reachable scope, should it escalate.

And this investigation process isn’t just about investigating and fixing. It must output the investigation data and generate Finding hypotheses so that when humans do step in, they have context for their judgment, or so that a dedicated Agent can make the call later. (This interface is a deliberate opening for future multi-Agent autonomy.)

But I also designed one exception into this system: “the system’s decisions keep getting overridden by humans.” When that happens, it means the system’s judgment capability itself is flawed. It must stop immediately, because a system that can’t judge correctly also can’t correctly judge whether it needs to stop. So I set a forced halt condition for human intervention here, rather than leaving that judgment to the system.

graph TD A["Anomaly detected"] --> B["Agent self-investigates
(limited to 1 cycle)"] B --> C{Signal sources have
enough information?} C -->|Yes| D["Auto-remediate"] C -->|No| E["Escalate to human
(with investigation results)"] F["System repeatedly
overridden by humans"] --> G["Immediate halt
(judgment capability compromised)"]

Anomaly handling flow: investigate first, then escalate — sole exception is when the system's own judgments are repeatedly overridden

By this point, you’ve probably noticed: methodology isn’t a one-time deliverable — it’s a framework that’s continuously challenged and refined. With the original architecture plus three cores, vulnerability safeguards, and human intervention points, we finally have a set of principles that can guide Agent decision-making. But even after walking this far, we still haven’t started executing anything. That patience of “not rushing into action” is one of the secrets to building a good methodology. If you rush to land it, build the system, and produce results too early, you may get a PoC that looks excellent at first and validates 90% of the idea, only to hit a wall later and have to turn back.

In the next article, we’ll look at how to “operationalize” the methodology into a Playbook, because no matter how good the methodology is, if it can’t be turned into something an Agent can follow and execute, it’s just a nice-looking document.

References:

[1] MAPE-K — self-adaptive feedback loops in software engineering https://en.wikipedia.org/wiki/MAPE-K

[2] Google SRE: How Google Runs Production Systems — error budget model https://sre.google/sre-book/table-of-contents/

[3] Goodhart’s Law — when a measure becomes a target, it ceases to be a good measure https://en.wikipedia.org/wiki/Goodhart%27s_law

[4] Tips 4: Agent’s Timezone Blind Spot — implicit assumption case /en/articles/16-tips-agent-timezone-blind-spot

[5] Article 3: Context Engineering — fact-based signal sources /en/articles/3-context-engineering

Support This Series

If these articles have been helpful, consider buying me a coffee

☕ Buy me a coffee