Memory Hierarchy Mental Model

Understanding the critical distinction between State (ephemeral execution), Short-Term Memory (session context), and Long-Term Memory (persistent user data)

🌍
Long-Term Memory
Cross-Session Persistence
πŸ’¬
Short-Term Memory
Session-Level Context
⚑
State
Task Execution
⏱️ Persistence Timeline
⚑
State
Seconds to Minutes
(Single Task)
πŸ’¬
Short-Term Memory
Minutes to Hours
(One Session)
🌍
Long-Term Memory
Days to Forever
(Across Sessions)
⚑
State
Ephemeral Execution Context
⏱️ Lifespan
Exists only during a single task execution. Disappears when task completes.
πŸ“¦ Contains
  • Current user query
  • System instructions
  • Message buffer (this task only)
  • Tool calls (pending/completed)
  • Intermediate results
  • Current step counter
🎯 Purpose
Track progress and context during multi-step task execution. Enables the agent to know "where it is" in solving a problem.
Example: User asks "Book me a flight to NYC." State tracks: initial query β†’ search tool called β†’ results retrieved β†’ booking tool prepared β†’ confirmation generated. After task completes, state is discarded.
πŸ’¬
Short-Term Memory
Session-Level Continuity
⏱️ Lifespan
Persists for the duration of a conversation session (minutes to hours). Resets when session ends.
πŸ“¦ Contains
  • Full conversation history
  • Context from previous exchanges
  • References user made earlier
  • Temporary preferences ("today only")
  • Session-specific data
🎯 Purpose
Enable coherent multi-turn conversations. Agent remembers what you discussed 5 minutes ago in the same chat.
Example: User: "Find Italian restaurants." Agent provides options. User: "What about the second one?" Agent knows "second one" refers to a restaurant from 2 exchanges ago because it's in short-term memory.
🌍
Long-Term Memory
Persistent User Knowledge
⏱️ Lifespan
Stored permanently across sessions (days, weeks, forever). Survives session closure and system restarts.
πŸ“¦ Contains
  • User preferences & interests
  • Historical facts about user
  • Past decisions & patterns
  • Relationship context
  • Episodic memories (key events)
🎯 Purpose
Personalize experiences across time. Build ongoing relationships. Remember important user context indefinitely.
Example: User had 10 conversations over 3 months about vegetarian recipes. Long-term memory stores "User is vegetarian, prefers Indian cuisine, allergic to peanuts." Next session (weeks later), agent automatically filters suggestions accordingly.
🧠 Short-Term Memory Management Strategies
πŸ“š Full Conversation History
Include every message from the session in every prompt. Maximum context preservation.
βœ… Pros: No information loss, complete context
❌ Cons: Expensive (tokens), slow, can hit context limits
πŸͺŸ Sliding Window
Keep only the last N messages (e.g., most recent 10 exchanges). Discard older messages.
βœ… Pros: Fixed token cost, fast, scalable
❌ Cons: Can lose important early context
πŸ“ Summarization
Periodically condense older messages into compact summaries. Keep recent messages verbatim.
βœ… Pros: Balanced approach, preserves key info
❌ Cons: Summarization quality varies, extra LLM calls
πŸ’‘ Critical Mental Model Insights
State β‰  Memory (Most Important!)
State is execution scaffolding for one task. Memory is context accumulation across tasks or sessions. Don't confuse themβ€”state is transient by design, memory is intentionally persistent.
Memory is Simulated, Not Real
LLMs don't inherently "remember" anything. Memory is constructed by feeding past interactions into current prompts. It's an illusion created through context management.
Each Layer Depends on the One Below
State enables single-task execution. Short-term memory accumulates states across tasks. Long-term memory distills short-term memory into durable knowledge. Each builds on the previous layer.
Trade-offs Are Unavoidable
More memory = better context but higher cost and slower responses. Less memory = cheaper and faster but risks losing important information. Choose strategies based on use case requirements.
Context Window Limits Everything
Models have token limits. Eventually you must choose: full history (expensive), sliding window (limited), or summarization (lossy). Long-term memory in vector DBs bypasses this constraint.
Storage Location Reflects Lifespan
State lives in execution variables (RAM). Short-term memory in prompt buffers. Long-term memory in databases (PostgreSQL, vector stores). Architecture reflects persistence needs.