In the previous article, we built the three-loop closure architecture that lets the system break through ceilings and keep growing. But architecture is just the skeleton — borrowing Mosky’s analogy, even the Attack on Titan’s titans grow the skeleton first, then the flesh. So starting with this article, we’re putting meat on the bones: what are the three cores of the methodology? What are its vulnerabilities? And when should humans step in?
The Three Cores of the Methodology
After the Agent organized the approach into a methodology, we started discussing what its core really was. After several rounds of back-and-forth, the Agent helped me converge on three interdependent cores:
Core 1: Multi-Strategy Discovery
Because relying solely on “detecting known issues” always has a ceiling, the Discovery Loop needs to grow on its own. That “growth” requires multiple strategies to avoid being constrained, with each strategy following a different growth curve:
- Pattern Propagation (linear): When a problem is found in one Agent, check whether other Agents have the same issue
- Mutation Probing (linear): Create variations of known issues to see if the system misses the variants
- Root Cause Inference (exponential): Derive entirely new problem categories from a single root cause — this is the key to breaking through the ceiling, because one root cause can map to multiple problems you never imagined
- Cross-Domain Transfer (combinatorial): Apply problem patterns from one domain to another
Core 2: Fact-Based Signal Sources
All judgments in the methodology must be grounded in facts (Source of Truth), not assumptions. So we split signals into three layers: static analysis (examining the code), state analysis (examining the runtime database state), and production signals (observing actual runtime behavior). Only multi-layer signal cross-validation can produce conclusions you can have confidence in.
This directly connects to what we discussed in Article 3: Context Engineering [5] — the timezone blind spot from before was a textbook case of an Agent operating on assumptions instead of facts.
Core 3: Self-Evolving Metrics
This is the most important one, because without metrics you can’t distinguish priority from importance. But metrics themselves can become constraints — when you optimize one metric, you might unknowingly regress on other dimensions.
So metrics need a mechanism to be “challenged by facts”: when facts contradict a metric, the problem isn’t the facts — it’s the metric that needs adjusting. At the same time, every metric should have a corresponding “inverse/positive shadow metric” that monitors whether optimizing the primary metric is causing harm elsewhere.
For example, if you’re optimizing “detection rate” and “false positive rate” climbs simultaneously, the shadow metric tells you that you might just be manufacturing more noise rather than genuinely improving detection.
Looking Back for Vulnerabilities — Methodology Isn’t Perfect
Sounds pretty complete at this point, right? Three interlocking loops, three cores — feels like everything’s been accounted for.
But I went one step further: I asked the Agent to review the framework one more time, using three sub-agents to examine the entire structure — three loops, three cores — for vulnerabilities.
The Agent identified seven structural weaknesses. A few are particularly worth highlighting:
- Self-Referential Loop: All discoveries originate from known issues. If the known issues themselves are biased, the entire system grows in the wrong direction.
- Resolution Verification Gap: “Check passed” doesn’t mean “problem solved.” An issue marked as fixed might have just been worked around.
- Unobservable Unknowns: Some problems leave no trace at any signal layer. No matter how sophisticated the system, it can’t see them.
- Cost Blindness: The system keeps adding checks, but there’s no mechanism to evaluate the ROI of each check.
These vulnerabilities are themselves part of the methodology, because knowing where your weaknesses are is what lets you design the corresponding safeguards. Some vulnerabilities can be automatically remediated by the system (e.g., resolution verification can use automated regression testing). Others require human intervention (e.g., cost decisions and strategic direction) — which naturally leads to designing human intervention points.
And methodology isn’t built in a vacuum. I had the Agent search academic research and it found 33 relevant sources — some supporting the methodology’s concepts (like the MAPE-K self-adaptive loop [1], Google SRE’s error budget model [2], Netflix’s Chaos Engineering), and some directly challenging it (like Model Collapse risk — where self-referential systems gradually degrade — and Goodhart’s Law [3] — where a measure ceases to be a good measure once it becomes a target). These counter-arguments aren’t meant to demolish the methodology; they tell us exactly where to install safeguards. The self-referential loop vulnerability maps to Model Collapse risk. The shadow metric design maps to Goodhart’s Law.
When Should Humans Step In?
But not every problem should be tossed to humans — otherwise it’s humans working for the Agent. So next, I borrowed the familiar 80/20 rule (a loose application, since that’s not its original intent) to draw a first rough boundary. I told the Agent that I expected the system to self-operate 80% of the time, with humans stepping in for the critical 20%, because in practice, the things that truly need human judgment might really only be about 20%.
Starting from this concept, I had the Agent revisit the vulnerabilities it had found, and it realized some problems could be self-remediated — specifically those where all the necessary information exists within the system’s reachable signal sources. Examples: resolution verification (automated regression testing), temporal blind spots (adding time-dimension scans), check interaction effects (tracking causal chains), and so on.
It also recognized that the problems truly requiring human intervention are those needing external knowledge or business judgment — severity assessment requires understanding business context, cost trade-offs require strategic-level judgment, and signal source investment requires evaluating whether it’s worth spending resources to build new observation channels.
After final estimation: once the system matures, humans would only need roughly one to two hours per month to handle these intervention points, with the system running autonomously the rest of the time (I think that’s overly optimistic — probably closer to an hour per week, though if models keep improving it might actually be as the Agent predicted).
But here’s the important caveat: 80/20 shouldn’t be the target from the start — it should be the result of gradual evolution. In the early stages, I envision starting at 50/50 — the system runs half, humans verify half — while building more reliable judgment mechanisms in parallel (such as having multiple different models cross-vote on critical decisions, along the lines of the MAGI concept). After a period of time, once these mechanisms have gone through eight or more cycles with verified accuracy above 90%, you can start gradually reducing human involvement. If you force an 80/20 split from the beginning, you’re letting go of the reins before the system has earned your trust.
I also made a key design decision here: when the system detects an anomaly, it should not immediately escalate to humans — it should first investigate on its own. Only when the investigation reveals it can’t handle the situation (the required information isn’t within the system’s reachable scope) should it escalate. And this investigation process isn’t just about investigating and fixing — it must output the investigation data and generate Finding hypotheses so that when humans do step in, they have context for their judgment, or so that a dedicated Agent can make the call later (this interface is a deliberate opening for future multi-Agent autonomy). But I also designed an exception into this system. The sole exception is “the system’s decisions keep getting overridden by humans” — because when that happens, it means the system’s current judgment capability (judgment criteria and mechanisms) is fundamentally flawed. It must stop immediately, because a system that can’t judge correctly also can’t correctly judge whether it needs to stop. So I set a forced halt condition for human intervention here, rather than leaving the judgment to the system.
(limited to 1 cycle)"] B --> C{Signal sources have
enough information?} C -->|Yes| D["Auto-remediate"] C -->|No| E["Escalate to human
(with investigation results)"] F["System repeatedly
overridden by humans"] --> G["Immediate halt
(judgment capability compromised)"]
By this point, you’ve probably noticed: methodology isn’t a one-time deliverable — it’s a framework that’s continuously challenged and refined. With the original architecture plus three cores, vulnerability safeguards, and human intervention points, we finally have a set of principles that can guide Agent decision-making. But what specifically to do, which tools to use, and what to prioritize — those belong to the next layer: the Playbook.
In the next article, we’ll look at how to “operationalize” the methodology into a Playbook, because no matter how good the methodology is, if it can’t be turned into something an Agent can follow and execute, it’s just a nice-looking document.
References:
[1] MAPE-K — self-adaptive feedback loops in software engineering https://en.wikipedia.org/wiki/MAPE-K
[2] Google SRE: How Google Runs Production Systems — error budget model https://sre.google/sre-book/table-of-contents/
[3] Goodhart’s Law — when a measure becomes a target, it ceases to be a good measure https://en.wikipedia.org/wiki/Goodhart%27s_law
[4] Tips 4: Agent’s Timezone Blind Spot — implicit assumption case /en/articles/16-tips-agent-timezone-blind-spot
[5] Article 3: Context Engineering — fact-based signal sources /en/articles/3-context-engineering
Support This Series
If these articles have been helpful, consider buying me a coffee