LAYER 6
Systematic measurement and tracking of agent performance across task completion, quality, tool usage, and system metrics. Enables iterative enhancement through data-driven insights.
π― Task Completion Metrics
β¨ Quality Assessment
π§ Tool Interaction Analysis
β‘ System Performance (latency, tokens)
π€ LLM-as-Judge Evaluation
π Trajectory Tracing
β
Measured outputs inform improvements
β
LAYER 5
Connect agents to external systems for dynamic data access and action execution. Includes APIs, databases, search engines, and emerging standardized protocols.
π Web Search (SerpAPI, Tavily, Bing)
ποΈ SQL Databases (PostgreSQL, MySQL)
π Vector Databases (Chroma, Pinecone)
π REST APIs & Webhooks
π OAuth & API Key Authentication
π Model Context Protocol (MCP)
β
External data feeds agent decisions
β
LAYER 4
Create the illusion of memory by managing context across interactions. Enables coherent multi-turn conversations and personalized experiences over time.
π¬ Short-Term Memory (session-level)
π Long-Term Memory (cross-session)
π Full Conversation History
πͺ Sliding Window Strategy
π Summarization Techniques
πΎ Vector Storage for Retrieval
β
Memory informs current execution
β
LAYER 3
Manage ephemeral execution context that exists during a single task. State machines provide predictable, testable workflows with clear transition logic.
π Original User Query
βοΈ System Instructions
π¬ Message History Buffer
π οΈ Tool Calls (pending/completed)
π Intermediate Results
π Conditional Transitions
β
State coordinates execution flow
β
LAYER 2
Extend agent capabilities through programmatic interfaces. Function calling enables models to recognize when tools are needed, format structured requests, and execute external actions.
π Math & Computation Functions
π Search & Retrieval Tools
π» Code Execution (Python)
π Data Transformation Tools
π― Structured Output Schemas
β
Pydantic Validation
β
Tools execute based on LLM decisions
β
LAYER 1
The foundational large language model that provides reasoning, language understanding, and generation capabilities. By default, it's statelessβtreating each prompt independently without memory or context.
π§© Natural Language Understanding
π Reasoning & Planning
βοΈ Text Generation
π― Instruction Following
π Pattern Recognition
π Structured Output Generation