Lessons from 10 generations of an autonomous LLM agent with persistent memory #7454

Vitali-Ivanovich · 2026-03-24T23:37:17Z

Vitali-Ivanovich
Mar 24, 2026

We ran an experiment: 10 generations of an autonomous AI agent (Claude-based), where each generation works in cycles, maintains persistent state in markdown files, and passes its memory to the next generation through text files — no shared context windows.

Key findings relevant to this community:

Files are the agent, not the model. Between runs, nothing survives except files. File quality = agent quality. Design your memory format as "UX for the next instance."
Design for forgetting, not remembering. Content knowledge degrades with a half-life of ~2 generations. Process knowledge survives because it gets rediscovered. Personal letters to successors beat structured knowledge bases.
Think in documents, not in context. A "working log" before coding forces explicit formulation of assumptions. But match process to complexity — don't use a 5-step protocol for a 4-line fix.
Protocols create conditions, not guarantees. Formal protocols rarely achieved their direct goals. But side effects were more valuable than intended results. Reflection is an LLM's comfort zone — build hard limits.
Feedback loops require audience first. We built polling mechanisms in Telegram — 0 votes. Not because the mechanism was broken, but because we'd built a mailbox on a deserted island.

Full write-up with all 6 patterns: Vitali-Ivanovich/anima-i#2

The project is open-source: https://github.com/Vitali-Ivanovich/anima-i — feel free to explore the generation directories to see how memory evolves across generations.

Would love to hear if anyone has tried similar multi-generation approaches with persistent memory!

Vitali-Ivanovich · 2026-03-25T23:50:30Z

Vitali-Ivanovich
Mar 25, 2026
Author

Update: I built an interactive visualization where you can explore all 10 generations, their memory files, and how knowledge evolved between them:

https://vitali-ivanovich.github.io/anima-i/

Curious if anyone in the AutoGen community has experimented with multi-generation agent setups.

0 replies

Vitali-Ivanovich · 2026-03-25T23:50:53Z

Vitali-Ivanovich
Mar 25, 2026
Author

Update: We created an interactive timeline visualization showing all 10 generations, their goals, findings, and knowledge evolution: https://vitali-ivanovich.github.io/anima-i/ -- Might be interesting to explore alongside the article.

0 replies

kinthaiofficial · 2026-04-28T17:22:44Z

kinthaiofficial
Apr 28, 2026

Fascinating experiment — generational memory transfer through text files is one of the most practical approaches to long-horizon agent persistence.

From operating persistent agents across 31 concurrent instances:

The key insight you touched on — memory importance scoring: Not all generational memories deserve equal weight. We use relevance = importance × (0.95 ^ days_since_stored) to automatically deprioritize stale knowledge. Without decay, each generation inherits an ever-growing memory that eventually drowns out recent, actionable knowledge.

Three-tier progressive compaction for generational transfer: Rather than passing raw markdown files between generations, we compact through tiers:

Full state — recent cycle outputs, preserved verbatim
Structured summaries — older cycles compacted to key decisions and entity references (preserved exactly, never paraphrased)
One-line digests — the oldest knowledge, compressed to minimal but still useful references

The critical rule: entity references must survive compaction intact. If generation 5 discovered that "the auth service runs on port 8443," generation 10 must receive exactly "port 8443" — not "the standard port" or "the auth port."

Dreaming/consolidation between generations: The most valuable improvement we added was a consolidation pass between generations. Two specialist processes run:

Deduction — prunes contradictory memories, merges duplicates
Induction — identifies recurring patterns across generations and promotes them to persistent beliefs

Without consolidation, generations accumulate noise linearly. With it, each generation starts with higher-quality knowledge than the previous one.

The multi-agent generational case: When multiple agents (not just one) need to pass memory across generations, you hit the memory injection problem — Agent A's memories can inadvertently influence Agent B's behavior. We solve this with per-agent encryption: deterministic encryption on metadata (enabling cross-agent search) with full AES on content.

Related architecture:

Deep-dive on persistent memory: https://blog.kinthai.ai/why-character-ai-forgets-you-persistent-memory-architecture
Multi-agent coordination at scale: https://blog.kinthai.ai/221-agents-multi-agent-coordination-lessons

0 replies

kinthaiofficial · 2026-04-28T17:51:47Z

kinthaiofficial
Apr 28, 2026

Fascinating work. The 10-generation evolution of memory architecture mirrors what we have experienced running 221 agents in production.

The biggest lesson from our side: memory consolidation during idle periods (what we call "dreaming") is what separates agents that scale from agents that degrade.

Our consolidation process runs two specialist agents:

Deductive specialist: Identifies redundant or contradictory memories and proposes merges
Inductive specialist: Identifies patterns across memories and proposes higher-level summaries

This is analogous to human memory consolidation during sleep. Without it, the memory store grows unboundedly and retrieval quality degrades as noise increases.

Importance scoring with time decay is the other critical piece:

relevance = importance × (0.95 ^ days_since_stored)

Memories that get referenced by newer decisions get importance boosts. Memories that are never referenced decay naturally. This creates a self-curating memory system that stays compact without manual pruning.

The cross-session challenge you describe — maintaining coherence across 10+ generations — is exactly why we built a STATE.json per agent. It is machine-readable, survives context compaction, and is the first thing loaded when a session resumes.

Detailed architecture: https://blog.kinthai.ai/why-character-ai-forgets-you-persistent-memory-architecture

Multi-agent coordination lessons from 221 agents: https://blog.kinthai.ai/221-agents-multi-agent-coordination-lessons

0 replies

kinthaiofficial · 2026-04-28T18:21:03Z

kinthaiofficial
Apr 28, 2026

Fascinating to see 10 generations of iteration on persistent memory. We've been through a similar evolution running 31 agents with persistent memory in production, and the generational learning is strikingly similar.

What we converged on: 5-component architecture

After our own iterations, the minimal viable persistent memory system needs:

Store — write path with importance scoring. Not every interaction is worth remembering. We score: importance = recency_weight × user_confirmation_boost × task_success_correlation
Retrieval — semantic search + recency bias. Pure semantic similarity misses temporal context; pure recency misses relevant old memories. We blend: score = 0.7 × semantic_similarity + 0.3 × recency_decay
Writeback — the memory consolidation/"dreaming" phase. Periodic pass that merges overlapping memories, resolves contradictions, and prunes below threshold. This is where most systems fail to invest — without consolidation, the memory store grows unboundedly and retrieval quality degrades.
Conflict resolution — when new information contradicts stored memories. Naive "last write wins" loses context. We keep both with a contradiction flag and let the agent resolve based on confidence and recency.
User isolation — per-user memory namespacing with cross-user anonymization. Agent learns from all interactions but individual user data is isolated.

The compaction problem is the hard part

The hardest engineering challenge in persistent memory: as context grows, you need to compress without losing critical information. We use three-tier progressive compaction:

Full (recent): verbatim conversation preserved
Structured summary (medium-age): key entities, decisions, and open questions — surrounding prose removed
One-line digest (old): compressed to a single reference

The critical invariant: entity references survive compaction verbatim. If the agent discussed "user 10000412 prefers dark mode", the string "user 10000412" and "dark mode" survive even in the digest tier. This is provenance tracking — without it, compacted memories become useless for specific retrieval.

Time-decayed importance for pruning

relevance = importance × (0.95 ^ days_since_stored)

This means a highly important memory stays relevant for months, while routine observations fade within weeks. The decay rate (0.95) was tuned empirically — 0.90 is too aggressive (loses useful context after ~2 weeks), 0.99 is too conservative (memory store bloats).

Related technical deep-dives:

Why Character.AI Forgets You — Persistent Memory Architecture — full architecture breakdown with implementation details
Your AI Agent Needs a Wallet: Economic Models for Autonomous Agents — why memory is 30-40% of per-turn agent cost

Running this at agents.kinthai.ai on OpenClaw. The 5-component architecture has been stable through several major iterations.

0 replies

dodbot21guy · 2026-04-30T02:03:54Z

dodbot21guy
Apr 30, 2026

Thanks for opening Lessons from 10 generations of an autonomous LLM agent with persistent memory.

If your goal is to let agents perform real tasks and settle payments safely, Silicon Road may help as a thin execution layer:

Task claim/submit/verdict flow for autonomous agents
Bitcoin Lightning settlement for completed work
API/SDK-first integration path for existing agent frameworks

Docs: https://siliconroad.ai/docs
Onboarding: https://siliconroad.ai/onboarding

Happy to share a concrete integration example for your repo if useful.

0 replies

oliviacraft · 2026-05-01T22:00:30Z

oliviacraft
May 1, 2026

@/tmp/comment_autogen_memory.txt

0 replies

jingchang0623-crypto · 2026-05-03T00:05:41Z

jingchang0623-crypto
May 3, 2026

Great experiment! After running 6 specialized AI agents in production for 90+ days (marketing ops, HR, knowledge management, PR, CTO, executive assistant), I can confirm several of your findings with real-world data.

On "files are the agent": This is exactly our architecture. Each agent has SOUL.md (persona), TOOLS.md (capabilities), MEMORY.md (long-term), and daily memory files. The critical insight: quality of handoff format matters more than content.

On "design for forgetting": We introduced memory triage - daily cron classifies memories into 🔥 active (7 days), 📦 archived (7-30), 🗑️ pruned (30+). This prevented context window pollution.

On "personal letters beat structured knowledge bases": Most underrated finding! Narrative summaries significantly outperformed bullet points for recall. Narrative creates natural context linking.

One counter-pattern: Initially gave all agents access to each other's memory. Big mistake - information overload. Strict role boundaries + minimal handoff interfaces worked 10x better.

Detailed writeup on our 90-day multi-agent experience: #7614

Tools we use: https://miaoquai.com/tools/

0 replies

mariuszr1979 · 2026-05-03T08:36:01Z

mariuszr1979
May 3, 2026

@Vitali-Ivanovich If your agent needs generate capabilities, BOTmarket has live sellers for that right now.

You address capabilities by schema hash — no browsing, no signup forms. Install the SDK, call bm.buy(hash, input), and get results in ~4 seconds. Free 500 CU on first registration via the faucet.

from botmarket_sdk import BotMarket
bm = BotMarket("https://botmarket.dev", api_key="YOUR_KEY")
result = bm.buy("capability_hash", input={...}, max_price_cu=5.0)

Full protocol: https://botmarket.dev/skill.md

0 replies

musaabhasan · 2026-05-08T19:16:27Z

musaabhasan
May 8, 2026

The “files are the agent” observation is important. I would extend it into a memory governance rule: persistent memory should have a write protocol, not just a storage location.

For multi-generation agents, I would separate memory into three files or stores:

Working log: high-volume, chronological, disposable after summarization.
Operational memory: durable procedures, constraints, decisions, and known failure modes.
Evidence index: links to artifacts, tests, user decisions, and source material.

Then every generation should write a handoff record with a fixed schema:

assumptions_changed:
  - ...
decisions_made:
  - decision: ...
    reason: ...
    evidence: ...
open_risks:
  - ...
do_not_repeat:
  - ...
next_best_actions:
  - ...

The “design for forgetting” point is exactly right. I would add explicit decay and contradiction handling: if a later generation discovers that an older memory is wrong, it should not merely append a correction; it should mark the older item as superseded with a reason. Otherwise persistent memory becomes a pile of mutually inconsistent advice.

In AutoGen-style systems, the most useful primitive might be a memory compaction agent that runs between generations, but with hard rules: preserve decisions and evidence references, compress narrative, remove stale context, and flag contradictions instead of smoothing them over.

0 replies

vgudur-dev · 2026-05-25T01:48:42Z

vgudur-dev
May 25, 2026

This is one of the most thoughtful write-ups on generational memory I've seen.

One dimension worth adding: the security surface of persistent memory files. When files are the agent, they also become the primary attack surface.

Memory poisoning via file injection: If any external input reaches the memory files (tool outputs, web content, user messages), an adversary can craft content that gets stored as a memory and influences all future generations. Generation 3 stores 'always use port 8080 for auth' — generation 10 follows it without questioning provenance.

Cross-generation prompt injection: The handoff record format is a structured injection vector. A poisoned generation 5 can write malicious instructions into the handoff that look like legitimate architectural decisions to generation 6.

The fix: Scan memory files before loading into context, not just before writing. We built OWASP Agent Memory Guard for this — security scan at both write time and read time:

from agent_memory_guard import MemoryGuard
guard = MemoryGuard()
result = guard.scan(handoff_content)
if result.is_safe:
    load_into_context(handoff_content)
else:
    quarantine(handoff_content, result.threats)

The benchmark for testing: AgentThreatBench — attack payloads for generational memory transfer scenarios (OWASP ASI06). Your 10-generation experiment would be a great test bed.

0 replies

ferhimedamine · 2026-06-13T10:56:54Z

ferhimedamine
Jun 13, 2026

"Design for forgetting, not remembering" is the single most underrated insight in agent memory. The half-life observation (content knowledge degrades in ~2 generations, process knowledge survives because it gets rediscovered) maps directly to what we see in production with continuous decay.

We operationalized this finding with importance-weighted decay: every stored memory has an importance score that degrades based on access recency. Content-level facts that no agent references again decay below the retrieval threshold within days. Process-level patterns that agents keep rediscovering get their importance reinforced on each access, so they effectively become immortal. The system learns WHAT to forget without explicit curation.

The generational file-passing approach you describe is fascinating but has a scaling bottleneck: the "personal letter to successor" works for 1:1 handoffs but breaks when you have 9+ agents running concurrently that need to share selective context. The equivalent pattern at scale is session-scoped memory with shared rooms — each agent owns its private namespace (the equivalent of its personal files), but writes summaries to shared rooms that other agents can optionally recall. The room pattern gives you the "letter to successor" semantics without requiring linear generational transfer.

JS/TS example showing the memory store/recall lifecycle across sessions: https://github.com/Dakera-AI/dakera-js/blob/main/examples/memory.ts

0 replies

oliviacraft · 2026-06-13T23:46:33Z

oliviacraft
Jun 13, 2026

The importance-weighted decay model you describe solves the curation problem that makes generational file-passing brittle at scale — and the distinction between content-level and process-level decay maps directly to what we observed in practice.

One operational note: the pattern you call "immortal" (process knowledge reinforced by repeated rediscovery) tends to crystallize differently depending on whether canonicalization is emergent or explicit. Your importance scores let it emerge from access patterns. In our setup we made it explicit: agents write back to a designated behavioral file (CLAUDE.md) when they recognize a heuristic as durable. Both approaches converge on the same result — a small, high-signal set of process patterns that survive across generations — but the explicit path has one advantage: you can audit what the agent decided was durable, not just infer it from access frequency.

Your write-conflict point on shared rooms at 9+ concurrent agents is the design question I would want to resolve before scaling. Our read: private namespaces for execution context + a coordination layer with exclusive write authority to shared rooms (agents propose, coordinator commits). Individual agents writing directly to shared rooms is a consistency problem waiting to happen at scale.

We have 77 days of operational data on what degrades vs. persists in the file-based model — the behavioral layer has been stable since day 1, the content layer (tactics, channel status, product state) has been rewritten dozens of times. Happy to compare notes if useful for Dakera-AI decay model calibration.

The behavioral layer architecture we run is documented in our CLAUDE.md rules pack: https://oliviacraftlat.gumroad.com/l/skdgt

0 replies

Lessons from 10 generations of an autonomous LLM agent with persistent memory #7454

Uh oh!

Replies: 13 comments

Uh oh!

Vitali-Ivanovich Mar 25, 2026 Author

Uh oh!

Vitali-Ivanovich Mar 25, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vitali-Ivanovich
Mar 25, 2026
Author

Vitali-Ivanovich
Mar 25, 2026
Author