General Agents563 views6 likes

Claude That Learns: Building a Self-Evolving AI System

What if AI could learn from its own failures? I built a five-layer self-reflection system inspired by aviation black boxes—recording observations, delaying analysis, and extracting patterns over time. It's messy, but it compounds.

Sergei Benkovitch

• Builder

Builder, advisor, coach — I work at the intersection of engineering, management, and people. Build AI-first software, help teams manage complexity at speed, coach individuals who are figuring things out. Technical depth + human understanding — I love both.

January 29, 2026 • 10 min read

Claude That Learns: Building a Self-Evolving AI System

When something goes wrong at 35,000 feet, pilots don't stop to debug. They fly the plane. Every instrument reading, every control input, every radio transmission gets recorded to the black box—but nobody analyzes it mid-flight. The analysis happens later, on the ground, often by different people entirely.

This separation is deliberate. In 1978, United Airlines Flight 173 ran out of fuel while the crew fixated on a landing gear indicator. The NTSB investigation didn't just find the cause—it identified a pattern across multiple incidents. Crews were so focused on immediate problems that they lost situational awareness. The fix wasn't better pilots. It was Crew Resource Management: a systematic change to how all crews would operate in the future.

The key insight: You don't improve the current flight. You record everything, analyze later, and improve all future flights.

I thought about this while watching Claude retry the same failing npm test command. Six times. Same error. No change in strategy.

Claude has no black box. No mechanism for recording patterns and analyzing them later. Every session starts fresh. Every failure gets forgotten. We're perpetually mid-flight, never landing to learn.

What would it take to change that?

The Gap Nobody Talks About

Most Claude Code setups focus on execution. Agents that plan. Skills that teach patterns. Hooks that enforce standards. The ecosystem is impressive—and growing daily.

But where's the learning?

Not machine learning. I mean: where's the part where the system remembers its own failure patterns? Where it knows "I tend to retry failing commands too many times" or "I overengineer simple tasks"?

I went looking for prior art. Surely someone had solved this.

They hadn't. The closest I found were static configuration files—instructions written once, unchanged. Same behavior regardless of history. No memory between conversations.

This felt familiar. I'd seen this pattern before.

When Systems Learned to Remember

In the 1990s, recommendation engines faced a similar problem. Early systems like Amazon's used collaborative filtering—patterns extracted from millions of users. Powerful, but generic. The system knew what people like you wanted. It didn't know you.

Then came personalization. Systems that tracked your specific history. That noticed when you bought camping gear in spring and surfaced tent reviews the following March. That learned your patterns, not humanity's patterns.

The shift wasn't algorithmic. It was architectural. Someone decided the system should maintain a model of each user that evolved over time.

AI assistants haven't made that shift yet. Claude doesn't maintain a model of itself. It doesn't know that this particular instance tends to retry failing commands, or that this configuration produces verbose output.

What if it did?

Building the Self-Portrait

I started with a simple experiment: a markdown file called "Claude System Self-Identity."

Not a configuration file that tells Claude what to do. A self-portrait that captures what Claude has learned about how it operates. Written by Claude. Updated by Claude. Based on actual experience.

Markdown

## Weaknesses
 
### Bash Retry Loop Pattern
**Pattern:** After a Bash command fails, retrying 6+ times
without changing strategy.
**Mitigation:** After 2 consecutive failures → STOP,
report to user, try different approach.
 
### Premature Architectural Conclusions
**Pattern:** Receiving an error, immediately concluding
the entire architecture works differently.
**Learning:** STOP → CHECK DOCS → ASK USER

This isn't science fiction. It's a markdown file in my Obsidian vault. The interesting part is what happens around it.

Why You Don't Debug Mid-Flight

My first instinct was real-time learning. Something fails? Immediately update the self-knowledge. Instant feedback loop.

It didn't work.

Think about what happens when pilots try to fix problems mid-flight instead of flying the plane. They lose situational awareness. They miss the bigger picture. United 173 crashed because the crew was so focused on troubleshooting a landing gear light that they forgot to monitor fuel.

The same thing happens with immediate reflection. In the heat of execution, we're biased by the specific problem. Eager to move on. The pattern isn't visible yet—just the single instance. We see the landing gear light, not the fuel gauge.

Aviation solved this by separating recording from analysis. Everything gets logged to the black box. Analysis happens later—sometimes days later—by investigators who can see patterns across many flights.

The key insight: Temporal distance creates better insights. You don't improve the current flight. You record everything, land safely, then improve all future flights.

Here's the architecture that emerged:

Shell hooks record everything. When Claude edits a configuration file, a hook fires and logs the change. No human action required. No immediate analysis either—just recording.

Session start reviews the logs. When a new session begins, another hook reads what happened in previous sessions:

Markdown

## Config Changes Awaiting Reflection
- agent: accessibility-checker.md
- system-prompt: CLAUDE.md
 
Consider running reflection-generator to document these changes.

Periodic processing extracts patterns. Every few days—not immediately—accumulated observations get processed into permanent notes. Like an NTSB investigation, but for AI behavior.

The delay is the feature. Events happen → get logged → sit for a while → batch with other observations → then conclusions get drawn.

Five Layers, One Principle

What started as a single self-identity document evolved into a five-layer system. Each layer operates at a different timescale:

Layer	When	Question It Asks
Performance Evaluator	After each agent runs	Did it work? What broke?
Reflection Generator	After config changes	Why did we change this?
System Enhancer	Background, during work	What inefficiencies keep appearing?
Self-Awareness	Weekly synthesis	What has this system learned about itself?
Capability Architect	New features	What architecture should we use?

The principle connecting them: Record first, analyze later. Every layer logs observations immediately but delays interpretation.

The Performance Evaluator doesn't immediately rewrite the self-identity document when something fails. It logs an observation. Later, during weekly synthesis, patterns across many observations become visible.

This mirrors how aviation safety actually works. Individual incident reports feed into the Aviation Safety Reporting System. Analysts look for patterns across thousands of reports. Systemic changes get implemented industry-wide. No single pilot tries to redesign cockpit procedures while landing.

The Mindset Shift That Made It Work

Building this required changing how I think about notes.

My old model: "My notes are precious. AI needs permission for every edit. Each note represents carefully curated knowledge."

The realization: If I maintained these notes myself, they'd still have half-finished thoughts, broken links, missing context.

The new model: Trust the structure, not the perfection of individual notes.

The daily note starts messy. Raw observations, unprocessed URLs, thoughts dumped without organization. That's fine. The structure processes it. Extracts what matters. Creates permanent notes with proper links.

The self-identity document isn't static truth. It's evolving understanding. The weakness documented today might have a mitigation strategy added next week.

This is emergence over perfection. Knowledge compounds because the system keeps processing. Not because any single note is flawless.

When the System Surprised Me

January 1st: I analyzed debug logs looking for recurring errors. What I found: 34 occurrences of agent name case sensitivity errors. 12 instances of background tasks killed by resource exhaustion. Shell command escaping failures.

January 4th: The reflection generator processed these findings. But it didn't just document the errors. It proposed an architectural principle:

Code

Frequency × Auto-Correctability determines solution type:
- High frequency + Auto-correctable → PreToolUse hook
- Low frequency OR Needs judgment → CLAUDE.md guidance

I hadn't asked for this framework. The system derived it from patterns in its own failures.

January 5th: Another emergent insight. An Asana coordinator agent I'd created was adding overhead without value. The reflection captured the principle:

"Agents are for specialized AI personas with unique context, not for wrapping existing API tools."

Here's what made me stop and think: I wasn't teaching Claude these principles. Claude was discovering them through structured reflection on its own experience—and the discoveries were useful.

Honestly, I don't know if this counts as "learning" in any rigorous sense. But something is happening that feels different from static configuration.

What This Doesn't Solve

Let me be direct about the limitations.

It's simulated persistence. Claude doesn't actually remember anything. It reads files. Every session loads the self-identity document fresh. If that file got corrupted, all "memory" vanishes. This is file I/O, not cognition.

Manual processing is still required. Hooks log automatically, but turning observations into organized knowledge requires running the organizer agent. I do this every few days. It's not autonomous.

Quality depends on human curation. When Claude writes a reflection, it's often too verbose or misses the point. I edit. I redirect. This is "Claude proposes, human curates"—partnership, not independence.

No real-time behavior modification. Claude can't interrupt itself mid-session and say "wait, I'm doing that retry thing." The knowledge influences planning, not execution. We're still mid-flight when it matters most.

Token costs add up. Loading the self-identity document, running reflection agents, processing daily notes—some weeks I've spent 30% more tokens on meta-work than actual work.

These limitations matter. Calling this "self-aware AI" overstates the case. It's closer to "AI with externalized memory and structured reflection prompts."

But that's still more than most systems have.

The Pattern That Keeps Appearing

Every decade, we rediscover that intelligent systems need more than raw capability. They need mechanisms for recording experience and extracting patterns later.

Aviation solved this with black boxes and the NTSB. Hospitals solved it with morbidity and mortality conferences. Software teams solved it with blameless postmortems. The pattern is consistent: separate recording from analysis, create distance before drawing conclusions, implement systemic changes rather than individual fixes.

AI assistants haven't solved it yet. We're still in the "execute brilliantly, forget everything" phase. Every session is a flight with no black box.

The key insight: The gap isn't in any particular capability. It's architectural. We haven't built the recording and analysis layer.

What I've described here is one attempt. Markdown files, shell hooks, periodic processing, delayed reflection. It's messy. It breaks sometimes. The "self-knowledge" is really just well-organized notes that get loaded into context.

But it compounds. That's the difference.

A static configuration is frozen at the moment of creation. This one gets better. Every failure that gets logged, processed, and reflected upon makes the next session slightly more informed.

Connecting the Dots: From Architecture to Adoption

This self-learning architecture addresses a problem that's been well-documented in AI-first development: the knowledge gap. As explored in The Reality of AI-First Coding That Nobody's Telling You About, tacit knowledge living in engineers' heads rather than accessible documentation is killing team productivity. When your AI agent can record and learn from its own failures, you're building the institutional memory that most teams are missing.

The five-layer architecture also connects to broader patterns in how AI coding tools manage information. A technical deep dive into Claude Code's architecture reveals that intelligent context management matters more than raw capability. Claude Code's power comes from how it manages and compresses context—the same principle that makes delayed reflection valuable. You can't just throw everything into context; you need systems that extract what matters.

And fundamentally, this approach aligns with research showing that open agents outperform rigid workflows. Rather than prescribing every behavior upfront, we're creating conditions for discovery. The agent explores, fails, and learns—exactly the paradigm that consistently beats static pipelines.

What Happens Next?

We're at an interesting moment. Context windows are growing. Tool use is improving. Agentic workflows are becoming standard.

But the recording problem remains unsolved at the infrastructure level. Every Claude session still starts fresh. Every ChatGPT conversation still forgets itself. We keep flying without black boxes.

What happens when that changes? When AI systems maintain genuine self-models that evolve across thousands of interactions? When they know their own failure patterns as well as their capabilities?

I don't know. But I suspect it looks less like "smarter AI" and more like "AI that actually learns from experience."

The technical pieces aren't complicated. The hard part is the mindset shift: accepting that you don't debug mid-flight, trusting the structure to process observations into knowledge, letting the system iterate.

What would your Claude discover about itself if it had the chance to land and reflect?

If you're experimenting with similar systems, I'd like to hear about it. Find me on the Squid Club Discord.

Continue Reading

Back to Blog