Memory Hierarchy Mental Model

⚡

State

Ephemeral Execution Context

⏱️ Lifespan

Exists only during a single task execution. Disappears when task completes.

📦 Contains

Current user query
System instructions
Message buffer (this task only)
Tool calls (pending/completed)
Intermediate results
Current step counter

🎯 Purpose

Track progress and context during multi-step task execution. Enables the agent to know "where it is" in solving a problem.

Example: User asks "Book me a flight to NYC." State tracks: initial query → search tool called → results retrieved → booking tool prepared → confirmation generated. After task completes, state is discarded.

💬

Short-Term Memory

Session-Level Continuity

⏱️ Lifespan

Persists for the duration of a conversation session (minutes to hours). Resets when session ends.

📦 Contains

Full conversation history
Context from previous exchanges
References user made earlier
Temporary preferences ("today only")
Session-specific data

🎯 Purpose

Enable coherent multi-turn conversations. Agent remembers what you discussed 5 minutes ago in the same chat.

Example: User: "Find Italian restaurants." Agent provides options. User: "What about the second one?" Agent knows "second one" refers to a restaurant from 2 exchanges ago because it's in short-term memory.

🌍

Long-Term Memory

Persistent User Knowledge

⏱️ Lifespan

Stored permanently across sessions (days, weeks, forever). Survives session closure and system restarts.

📦 Contains

User preferences & interests
Historical facts about user
Past decisions & patterns
Relationship context
Episodic memories (key events)

🎯 Purpose

Personalize experiences across time. Build ongoing relationships. Remember important user context indefinitely.

Example: User had 10 conversations over 3 months about vegetarian recipes. Long-term memory stores "User is vegetarian, prefers Indian cuisine, allergic to peanuts." Next session (weeks later), agent automatically filters suggestions accordingly.

🧠 Short-Term Memory Management Strategies

📚 Full Conversation History

Include every message from the session in every prompt. Maximum context preservation.

✅ Pros: No information loss, complete context
❌ Cons: Expensive (tokens), slow, can hit context limits

🪟 Sliding Window

Keep only the last N messages (e.g., most recent 10 exchanges). Discard older messages.

✅ Pros: Fixed token cost, fast, scalable
❌ Cons: Can lose important early context

📝 Summarization

Periodically condense older messages into compact summaries. Keep recent messages verbatim.

✅ Pros: Balanced approach, preserves key info
❌ Cons: Summarization quality varies, extra LLM calls

💡 Critical Mental Model Insights

State ≠ Memory (Most Important!)

State is execution scaffolding for one task. Memory is context accumulation across tasks or sessions. Don't confuse them—state is transient by design, memory is intentionally persistent.

Memory is Simulated, Not Real

LLMs don't inherently "remember" anything. Memory is constructed by feeding past interactions into current prompts. It's an illusion created through context management.

Each Layer Depends on the One Below

State enables single-task execution. Short-term memory accumulates states across tasks. Long-term memory distills short-term memory into durable knowledge. Each builds on the previous layer.

Trade-offs Are Unavoidable

More memory = better context but higher cost and slower responses. Less memory = cheaper and faster but risks losing important information. Choose strategies based on use case requirements.

Context Window Limits Everything

Models have token limits. Eventually you must choose: full history (expensive), sliding window (limited), or summarization (lossy). Long-term memory in vector DBs bypasses this constraint.

Storage Location Reflects Lifespan

State lives in execution variables (RAM). Short-term memory in prompt buffers. Long-term memory in databases (PostgreSQL, vector stores). Architecture reflects persistence needs.