Validation & Error Handling Framework

Building robust, production-ready agentic systems through comprehensive validation, error recovery, and context management

⚠️ Why This Matters
Without proper validation and error handling, early mistakes compound through your workflow, leading to unreliable outputs and production failures. This framework prevents error propagation and ensures consistent quality.
Part 1

Four Validation Strategies

Choose the right validation approach based on your quality requirements and constraints

Validation Pipeline
Agent Output
↓
Programmatic
LLM-Based
Rule-Based
Confidence
↓
βœ“ Pass
βœ— Fail β†’ Retry
βš™οΈ
1. Programmatic Validation
Use custom code to verify structural requirements, formats, and constraints.
  • Valid JSON/XML structure
  • Required fields present
  • Length requirements (min/max)
  • Data type checking
  • Numerical ranges
Best for: Format compliance, structural validation
πŸ€–
2. LLM-Based Validation
Another LLM evaluates output quality against semantic criteria.
  • Accuracy of information
  • Relevance to query
  • Tone appropriateness
  • Completeness of answer
  • Logical consistency
Best for: Quality assessment, semantic validation
πŸ“
3. Rule-Based Validation
Predefined business rules and domain-specific constraints.
  • Keyword presence/absence
  • Pattern matching (regex)
  • Business logic compliance
  • Domain-specific rules
  • Consistency checks
Best for: Business rules, compliance requirements
πŸ“Š
4. Confidence Scoring
Use model confidence scores to identify uncertain outputs.
  • Token probability scores
  • Multiple generation comparison
  • Threshold-based flagging
  • Uncertainty detection
  • Automatic escalation
Best for: Uncertainty detection, risk assessment
# Example: Multi-Layer Validation

def validate_output(output, config):
    # Layer 1: Programmatic - Structure
    if not is_valid_json(output):
        return False, "Invalid JSON structure"
    
    data = json.loads(output)
    
    # Layer 2: Programmatic - Required fields
    required_fields = ["category", "confidence", "reasoning"]
    if not all(field in data for field in required_fields):
        return False, "Missing required fields"
    
    # Layer 3: Rule-Based - Business logic
    if data["category"] not in config.allowed_categories:
        return False, "Invalid category"
    
    # Layer 4: Confidence - Threshold check
    if data["confidence"] < config.min_confidence:
        return False, "Low confidence score"
    
    # Layer 5: LLM-Based - Quality check
    quality_score = llm_evaluate_quality(data, config.criteria)
    if quality_score < config.quality_threshold:
        return False, "Failed quality evaluation"
    
    return True, "Validation passed"
πŸ’‘ Validation Layer Strategy
Start with fast, cheap validations (programmatic) and progress to slower, expensive ones (LLM-based) only if needed. This "validation funnel" catches most issues quickly while reserving costly checks for outputs that pass initial screening.
Part 2

Five Error Handling Strategies

Recover gracefully when validation fails or agents produce errors

Error Handling Decision Flow

Error Recovery Process
Validation Failed
↓
Analyze Failure Type
↓
Retry
Re-prompt
Fallback
Critique
Log & Escalate
↓
Max Attempts?
No β†’ Retry Strategy
Yes β†’ Fallback
πŸ”„
1. Simple Retry
Re-run the exact same prompt with same parameters. Effective when randomness in LLM sampling might produce better results on second try.
Use when: Failure seems random, no clear pattern, low cost to retry
πŸ’¬
2. Re-prompt with Feedback
Include validation error details in revised prompt. Agent learns from its mistake and corrects specific issues.
Use when: Clear validation failure reason, agent can learn from feedback
⚑
3. Fallback Mechanism
Switch to alternative agent, simpler task, or default response when all else fails. Ensures system doesn't break completely.
Use when: Max retries exhausted, critical path needs completion
πŸ”
4. Critique & Refinement
Agent critiques its own output, identifies issues, then regenerates improved version. Self-correction loop.
Use when: High quality needed, agent capable of self-assessment
πŸ“Š
5. Logging & Monitoring
Track all failures with context for analysis. Essential for debugging patterns and improving system over time.
Use always: Required for production systems, enables continuous improvement
Error Handling Strategy Selection Matrix
Scenario
Recommended Strategy
Why
Format error (invalid JSON)
Re-prompt with feedback
Clear, fixable error that agent can correct
Low confidence score
Retry with temperature adjustment
Sampling variation might improve confidence
Factual inaccuracy
Critique & refinement
Needs reasoning about correctness, not just format
Timeout or API error
Retry with exponential backoff
Transient infrastructure issue
Max retries exhausted
Fallback + log
Must complete workflow, track for analysis
Ambiguous input
Fallback to clarification request
Need more information from user
# Example: Comprehensive Error Handling

def execute_with_error_handling(agent, input_data, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            # Execute agent
            output = agent.run(input_data)
            
            # Validate output
            is_valid, error_msg = validate_output(output)
            
            if is_valid:
                log_success(agent, attempt)
                return output
            
            else:
                # Strategy: Re-prompt with feedback
                log_failure(agent, attempt, error_msg)
                
                if attempt < max_attempts - 1:
                    # Add error feedback to prompt
                    input_data = add_feedback(input_data, error_msg)
                    continue
        
        except Exception as e:
            log_exception(agent, attempt, e)
            
            if attempt < max_attempts - 1:
                continue
    
    # All attempts failed - use fallback
    log_fallback(agent, input_data)
    return get_fallback_response(input_data)
⚠️ Set Maximum Retry Limits
ALWAYS define maximum retry attempts (typically 2-5) to prevent infinite loops and runaway costs. Balance quality improvement against time and expense. Track retry rates to identify systemic issues.
Part 3

Context Management Techniques

Optimize information flow to maintain quality while preventing context overload

The Context Challenge

In chained workflows, context accumulates. Too little context and agents lose critical information. Too much and performance degrades due to attention dilution and increased latency.

Context Flow Strategies
Full Context
(All Previous Steps)
vs
Selective Context
(Only Relevant Info)
↓
❌ Context Overload
Slow, expensive, diluted attention
βœ“ Optimized
Fast, focused, effective
🎯
Selective Passing
Only pass information relevant to the next step. Filter out intermediate data that doesn't contribute to downstream decisions.
  • Extract only necessary fields
  • Summarize long content
  • Remove temporary variables
  • Pass final results, not all steps
πŸ”
Contextual Reiteration
Repeat critical context in each prompt to ensure agents don't "forget" essential information as chain lengthens.
  • Restate key objectives
  • Include important constraints
  • Remind of output format
  • Maintain persona consistency
βš–οΈ
Balance Optimization
Find the sweet spot between too much and too little context through testing and monitoring.
  • A/B test context levels
  • Monitor performance vs. context size
  • Track where agents fail
  • Adjust based on results
# Example: Selective Context Passing

def chain_execution(steps, input_data):
    context = {
        "original_request": input_data,  # Always preserve
        "step_results": {}
    }
    
    for i, step in enumerate(steps):
        # Selective: Only pass relevant prior context
        relevant_context = extract_relevant_context(
            context,
            step.dependencies
        )
        
        # Reiteration: Include critical constraints
        prompt = build_prompt(
            step.instruction,
            relevant_context,
            critical_constraints=context["constraints"]
        )
        
        # Execute step
        result = step.execute(prompt)
        
        # Store result but don't pass everything forward
        context["step_results"][i] = result
        
        # Balance: Summarize if context getting large
        if len(context["step_results"]) > 3:
            context["summary"] = summarize_results(
                context["step_results"]
            )
    
    return context["step_results"][-1]  # Return final output
πŸ’‘ Context Window Strategy
Start narrow, expand if needed: Begin with minimal context and add only when agents fail due to missing information. This prevents the common mistake of over-contexting from the start, which wastes tokens and degrades performance.
βœ“ Context Management Checklist
  • Identify dependencies: What does each step actually need?
  • Define critical context: What must persist across all steps?
  • Implement summarization: Condense information at key points
  • Test minimal context: Start lean, expand only when necessary
  • Monitor context size: Track token usage and performance
  • Document context flow: Clear specs for what passes between steps
Part 4

Validation Placement by Pattern

Where and how to implement validation for each of the 5 workflow patterns

Pattern-Specific Validation Strategies

Each workflow pattern has unique validation requirements and optimal placement points for quality gates.

πŸ”—
Prompt Chaining
Critical Challenge: Error propagation through sequential steps
  • After every step: Validate before passing to next agent
  • At decision points: Check branching logic correctness
  • Before final output: Comprehensive quality check
  • Context size: Monitor and summarize if growing too large

Recommended Approach:

Programmatic validation after each step (fast), LLM-based validation only at critical junctions or final output (expensive but thorough).

πŸ”€
Routing
Critical Challenge: Misclassification sends task to wrong agent
  • Classification output: Verify category is valid option
  • Confidence threshold: Flag low-confidence routing decisions
  • Reasoning check: Ensure classification logic makes sense
  • Worker output: Validate specialist agent's response

Recommended Approach:

Rule-based validation on classification output (must be valid category), confidence scoring to flag uncertain cases, fallback to general agent if confidence too low.

⚑
Parallelization
Critical Challenge: Inconsistent results from parallel agents
  • Individual outputs: Basic quality check on each agent
  • Consistency check: Flag major disagreements between agents
  • Synthesis validation: Ensure consolidation is coherent
  • Completeness: Verify all subtasks produced valid outputs

Recommended Approach:

Programmatic checks that all agents completed, LLM-based synthesis validation to ensure combined output is coherent. Consider voting/consensus mechanisms when agents disagree significantly.

πŸ”„
Evaluator-Optimizer
Critical Challenge: Evaluation loop runs too long or doesn't converge
  • Generated output: Validate each iteration's output
  • Evaluator consistency: Check evaluator criteria are applied properly
  • Improvement tracking: Verify output is actually getting better
  • Max iterations: Hard limit prevents infinite loops

Recommended Approach:

LLM-based evaluation with clear rubric, track improvement scores across iterations, stop if no improvement for 2 consecutive iterations OR max attempts (3-5) reached. Always set maximum iterations.

🎯
Orchestrator-Workers
Critical Challenge: Dynamic planning can diverge or loop infinitely
  • Plan validity: Verify orchestrator's strategy makes sense
  • Worker selection: Check chosen workers are appropriate
  • Progress tracking: Ensure system is making forward progress
  • Total steps limit: Cap maximum workflow complexity

Recommended Approach:

LLM-based plan validation, programmatic checks on worker selection logic, track state changes to detect loops, set maximum total steps (e.g., 10-20) to prevent runaway orchestration.

βœ… Universal Validation Principles
Regardless of pattern:
  • Validate at boundaries: Check inputs and outputs of each major component
  • Fail fast: Catch errors as early as possible in the workflow
  • Provide actionable feedback: Error messages should enable fixes
  • Log everything: Track all validation results for analysis
  • Set hard limits: Maximum iterations, timeouts, context sizes

🎯 Production Readiness Checklist

Essential Implementation Steps
  • Multi-layer validation: Combine programmatic, rule-based, and LLM validation
  • Error handling strategy: Define retry logic, fallbacks, and escalation paths
  • Context optimization: Implement selective passing and summarization
  • Maximum limits: Set retry caps, iteration limits, timeout values
  • Logging infrastructure: Track all inputs, outputs, validation results, errors
  • Pattern-specific checks: Implement validation appropriate to workflow pattern
  • Monitoring dashboards: Track error rates, retry rates, performance metrics
  • Continuous improvement: Analyze logs to refine prompts and validation criteria
Remember: Validation and error handling aren't optionalβ€”they're what separate prototype demos from production-ready systems. Invest the time upfront to prevent costly failures later.
↑