Skip to content
AI Engineering

The Agent Memory Problem: Why 5+ Solutions Exist and None Won

Mem0, Letta, Zep, graph-RAG, Neptune Memory, HiveMemory, Obsidian steering files -- the agent memory space is fragmenting faster than it's converging. Here's a landscape analysis of why no single solution wins, the four types of memory agents actually need, and a decision framework for choosing your architecture.

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

10 min read
Share:

In my previous post on the agent memory stack, I described the three-layer system I built with Kiro CLI and Obsidian to maintain context across 15+ parallel projects. The response was revealing: dozens of practitioners shared their own setups, each different, each solving a slightly different problem, each frustrated that no single tool covers the full space.

This fragmentation is the signal. We’re in a period where the agent memory problem has been identified but not solved. Five or more products compete in this space, each winning on a different axis, none providing a complete answer. Understanding why helps you pick the right architecture for your use case instead of chasing the next shiny memory framework.

The Four Types of Agent Memory

Every taxonomy is a lie, but this one is useful. Agents need four distinct kinds of memory, and most solutions only cover one or two well.

1. Working Memory (Session State)

What it is: the information actively held during a single agent execution. The current conversation, intermediate reasoning steps, tool call results not yet synthesized into a response.

Analogy: your desk while working on a task. Papers spread out, browser tabs open, half-written notes.

Properties:

  • Ephemeral — discarded when the session ends
  • Fast — must be accessible in milliseconds, typically in-context
  • Bounded — limited by the model’s context window (128K-1M tokens)
  • Structured by recency — recent items matter most

Who handles this well: the LLM itself. Context window management (sliding window, summarization, attention sinks) is a solved-enough problem for most use cases.

Where it breaks: sessions longer than the context window. Multi-step workflows that accumulate state over hours. The “context window is full” failure mode where the agent starts forgetting early instructions.

2. Episodic Memory (What Happened)

What it is: records of specific events, interactions, and decisions. What the user said last Tuesday. What error the agent encountered during deployment. What decision was made about the database schema three sessions ago.

Analogy: your personal diary. Indexed by time and context, not by topic.

Properties:

  • Persistent — survives across sessions
  • Temporal — “when did this happen?” is a first-class query
  • Personal — scoped to a specific user, agent, or project
  • Narrative — events form a coherent story with causal links

Who handles this well: Letta/MemGPT (explicit context paging with summaries), Zep (session-level memory with temporal indexing), custom solutions (my LOG.md approach).

Where it breaks: retrieval. Knowing that something happened is useless if you can’t find it when you need it. Episodic memory systems degrade when the volume grows — searching through 500 session logs for the one where the user mentioned their API key format is expensive.

3. Semantic Memory (What Is Known)

What it is: general knowledge and facts, detached from when they were learned. “The API endpoint is api.example.com/v2.” “The user prefers async communication.” “This project uses React 19.” Facts that are true regardless of the session in which they were discovered.

Analogy: an encyclopedia or knowledge base. Organized by topic, not by when you learned it.

Properties:

  • Persistent and slowly evolving — facts change, but not every session
  • Topical — organized by domain, entity, or concept
  • Shared (potentially) — some facts are true for all agents, not just one
  • Retrieval by similarity — “what do I know about X?” is the primary query

Who handles this well: vector stores (Pinecone, Weaviate, pgvector), knowledge graphs (Neptune, Neo4j), RAG systems. Also: my Obsidian session-context.md approach (flat files with semantic structure).

Where it breaks: conflicting facts. The user said the deadline is March 15 three weeks ago, but mentioned March 22 yesterday. Which is true? Semantic memory needs a staleness/confidence model that most implementations lack.

4. Procedural Memory (How to Do Things)

What it is: learned skills, workflows, and patterns. “When deploying to production, always run the test suite first.” “When the user says ‘format this’, they mean Markdown with headers.” “This API requires pagination with a cursor, not an offset.”

Analogy: muscle memory. You don’t remember when you learned to ride a bike; you just know how.

Properties:

  • Persistent and slowly refined — procedures improve with experience
  • Triggered by context — activated when a matching situation arises
  • Often implicit — embedded in behavior rather than declared as facts
  • Composable — complex procedures built from simpler ones

Who handles this well: steering files (Kiro/Claude Code), tool-use patterns (ReAct traces stored and retrieved), few-shot example libraries. Also: Amazon Quick’s learned skill system (procedures with success/failure counts).

Where it breaks: generalization. A procedure learned in one context may not apply in another. “Always use pagination” is good advice for most APIs but wrong for one-shot endpoints. Procedural memory needs context-awareness about when NOT to apply a learned pattern.

The Landscape: Five Approaches, None Complete

Approach 1: Vector Store + RAG (Mem0, Pinecone, pgvector)

How it works: convert memories into embeddings, store in a vector database, retrieve by semantic similarity at query time.

Wins on: semantic memory. “What do I know about the user’s deployment preferences?” returns relevant facts ranked by similarity.

Loses on: temporal reasoning (vectors don’t encode “when”), multi-hop relationships (“what connects person A to project B?”), procedural memory (workflows don’t embed well as flat text).

Best for: chatbots that need to remember user preferences across sessions. Simple fact recall.

Approach 2: Graph-Based (Neptune, Neo4j, knowledge graphs)

How it works: store entities and relationships as nodes and edges. Query by traversal (“who works on what project?”, “what decisions are connected to this requirement?”).

Wins on: relationship-heavy domains. When the question is “how is X connected to Y?” or “what are all the implications of changing Z?”, graph traversal is unbeatable.

Loses on: unstructured recall (“find something similar to what I mentioned last week”), episodic memory (events don’t naturally map to static graphs), write speed (maintaining a graph is expensive).

Best for: organizational knowledge. CRM-like agent context. Multi-agent shared state where relationships matter more than raw text.

Approach 3: Context Paging (Letta/MemGPT)

How it works: treat the agent’s context window like virtual memory. Page older context out to storage with explicit summaries. Page it back in when relevant. The agent has metadata about what’s available to recall, even when it’s not currently in context.

Wins on: session continuity. Long conversations that exceed context window limits. The agent maintains a coherent narrative across hours or days of interaction.

Loses on: retrieval precision (summaries lose detail), latency (paging context in takes time), multi-agent scenarios (each agent has its own virtual memory space).

Best for: personal assistants with long-running interactions. Single-agent, single-user scenarios where continuity matters more than breadth.

Approach 4: Structured Files (Obsidian, steering files, YAML state)

How it works: durable state stored as human-readable files on disk. Markdown notes, YAML configs, JSON state files. The agent reads them at session start to reconstruct context.

Wins on: transparency (you can read and edit the memory yourself), portability (files work with any agent framework), versioning (git tracks changes), procedural memory (steering files encode workflows naturally).

Loses on: scale (reading 50 files at session start is expensive), retrieval (no semantic search unless you add an index layer), shared memory (files are local by default).

Best for: developers and power users who want full control. Multi-project orchestration where each project needs isolated, inspectable state.

Approach 5: Hybrid (the emerging pattern)

How it works: combine multiple approaches. Vector store for semantic recall + graph for relationships + episodic log for temporal queries + steering files for procedures.

Wins on: coverage. Each memory type gets the storage/retrieval mechanism it needs.

Loses on: complexity. Four systems to maintain, synchronize, and debug. Integration surface area is large. No single query interface spans all types.

Best for: production systems where memory quality directly impacts business outcomes. Worth the complexity budget.

The Decision Framework

When choosing your agent memory architecture, answer these four questions:

1. What’s your recall pattern?

If you mostly need to recall…Use
”What does the user prefer?” (semantic facts)Vector store
”What happened last Tuesday?” (episodic events)Temporal log + search
”How are these things connected?” (relationships)Knowledge graph
”How do I do X?” (procedures)Steering files / few-shot store
All of the aboveHybrid

2. How many agents share memory?

  • Single agent, single user: context paging (Letta) or structured files work fine
  • Single agent, multiple users: vector store with user-scoped partitions
  • Multiple agents, shared context: knowledge graph or shared event bus
  • Multiple agents, independent: per-agent structured files (my approach)

3. What’s your latency budget?

  • Sub-100ms (in-context): working memory only, no external retrieval
  • 100ms-1s: vector search, file reads, simple graph queries
  • 1-5s: complex graph traversals, context paging, multi-source retrieval
  • Async (background): full reindexing, memory consolidation, cross-session summary

4. Do you need auditability?

If a human needs to inspect, edit, or verify what the agent “remembers”:

  • Structured files (Markdown/YAML) are immediately auditable
  • Knowledge graphs are auditable with tooling
  • Vector stores are opaque (you can’t meaningfully read an embedding)
  • Context paging summaries are readable but incomplete

Why No Solution Won (Yet)

The fundamental tension: the four memory types optimize for different things.

  • Semantic memory wants high-dimensional similarity search (vectors)
  • Episodic memory wants temporal indexing and causal chains (event logs)
  • Procedural memory wants structured, trigger-based retrieval (rules)
  • Working memory wants low-latency, high-bandwidth access (context window)

No single storage primitive serves all four well. Vectors can’t do temporal reasoning. Graphs can’t do fuzzy similarity. Files can’t do real-time retrieval at scale. Context windows can’t persist.

The market is fragmenting because each product picks one or two types and does them well, then tries to stretch into the others with mediocre results. Mem0 is excellent for semantic recall but awkward for procedures. Letta is excellent for continuity but awkward for shared knowledge. Knowledge graphs are excellent for relationships but awkward for unstructured facts.

Until someone builds a unified retrieval interface that routes queries to the appropriate storage backend based on query type — and handles the conflict resolution, staleness, and deduplication across backends — we’ll continue to have 5+ solutions competing for the same label.

My Current Bet

For my own workflows, I use structured files (Obsidian + steering) as the primary layer, supplemented by Amazon Quick’s knowledge graph for relationship queries and its built-in memory system for cross-session procedure learning. This gives me:

  • Procedural memory via steering files (explicit, editable, version-controlled)
  • Semantic memory via Obsidian notes + RAG indexing (searchable, auditable)
  • Episodic memory via session logs and chronological notes (temporal, grep-friendly)
  • Relationship memory via the knowledge graph (who works with whom, what connects to what)

Is it the most elegant architecture? No. Is it a single unified solution? No. Does it work reliably across 15+ parallel projects without losing context? Yes.

That’s the pragmatic answer in 2026: pick the tools that cover your most painful memory gaps, accept that you’ll use 2-3 systems in combination, and wait for the unified layer to emerge. It will. The market signal is too strong for it not to happen. But we’re 12-18 months away from someone shipping it convincingly.


Related: The Agent Memory Stack: Shipping Parallel Projects with Kiro CLI and Obsidian

Further reading: Vector vs Graph vs Episodic — Agent Memory Architectures Compared, Multi-Agent Memory from a Computer Architecture Perspective, Oracle: Unified Memory Core for AI Agents

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

Never miss a post

Get notified when I publish new articles about AI, Cloud, and AWS.

No spam, unsubscribe anytime.

Comments

Sign in to leave a comment

Related Posts

AI Engineering

AWS Agent Toolkit GA: How I Gave an Agent 15,000 AWS APIs Without Losing Sleep

AWS released the Agent Toolkit for AWS on May 6, 2026 -- a managed MCP server exposing the full AWS API surface to autonomous agents. I shipped an infrastructure agent the same week. Here's the two-phase safety pattern that lets you hand an agent the keys to your account without waking up to a $10K bill.

9 min
AI Engineering

MCP Gateway as Policy Enforcement Point: RBAC for Your Agent's Tool Access

Your AI agent has access to tools that perform real actions -- approving expenses, querying databases, modifying infrastructure. Prompt-based guardrails don't survive adversarial inputs. Here's how AgentCore Gateway + Cedar policies create a deterministic enforcement layer that operates independently of the agent's reasoning.

9 min