Validation & Error Handling Framework

Building robust, production-ready agentic systems through comprehensive validation, error recovery, and context management

⚠️ Why This Matters

Without proper validation and error handling, early mistakes compound through your workflow, leading to unreliable outputs and production failures. This framework prevents error propagation and ensures consistent quality.

Part 1

Four Validation Strategies

Choose the right validation approach based on your quality requirements and constraints

Validation Pipeline

Agent Output

↓

Programmatic

LLM-Based

Rule-Based

Confidence

↓

✓ Pass

✗ Fail → Retry

⚙️

1. Programmatic Validation

Use custom code to verify structural requirements, formats, and constraints.

Valid JSON/XML structure
Required fields present
Length requirements (min/max)
Data type checking
Numerical ranges

Best for: Format compliance, structural validation

🤖

2. LLM-Based Validation

Another LLM evaluates output quality against semantic criteria.

Accuracy of information
Relevance to query
Tone appropriateness
Completeness of answer
Logical consistency

Best for: Quality assessment, semantic validation

📏

3. Rule-Based Validation

Predefined business rules and domain-specific constraints.

Keyword presence/absence
Pattern matching (regex)
Business logic compliance
Domain-specific rules
Consistency checks

Best for: Business rules, compliance requirements

📊

4. Confidence Scoring

Use model confidence scores to identify uncertain outputs.

Token probability scores
Multiple generation comparison
Threshold-based flagging
Uncertainty detection
Automatic escalation

Best for: Uncertainty detection, risk assessment

# Example: Multi-Layer Validation

def validate_output(output, config):
    # Layer 1: Programmatic - Structure
    if not is_valid_json(output):
        return False, "Invalid JSON structure"
    
    data = json.loads(output)
    
    # Layer 2: Programmatic - Required fields
    required_fields = ["category", "confidence", "reasoning"]
    if not all(field in data for field in required_fields):
        return False, "Missing required fields"
    
    # Layer 3: Rule-Based - Business logic
    if data["category"] not in config.allowed_categories:
        return False, "Invalid category"
    
    # Layer 4: Confidence - Threshold check
    if data["confidence"] < config.min_confidence:
        return False, "Low confidence score"
    
    # Layer 5: LLM-Based - Quality check
    quality_score = llm_evaluate_quality(data, config.criteria)
    if quality_score < config.quality_threshold:
        return False, "Failed quality evaluation"
    
    return True, "Validation passed"

💡 Validation Layer Strategy

Start with fast, cheap validations (programmatic) and progress to slower, expensive ones (LLM-based) only if needed. This "validation funnel" catches most issues quickly while reserving costly checks for outputs that pass initial screening.

Part 2

Five Error Handling Strategies

Recover gracefully when validation fails or agents produce errors

Error Handling Decision Flow

Error Recovery Process

Validation Failed

↓

Analyze Failure Type

↓

Retry

Re-prompt

Fallback

Critique

Log & Escalate

↓

Max Attempts?

No → Retry Strategy

Yes → Fallback

🔄

1. Simple Retry

Re-run the exact same prompt with same parameters. Effective when randomness in LLM sampling might produce better results on second try.

Use when: Failure seems random, no clear pattern, low cost to retry

💬

2. Re-prompt with Feedback

Include validation error details in revised prompt. Agent learns from its mistake and corrects specific issues.

Use when: Clear validation failure reason, agent can learn from feedback

⚡

3. Fallback Mechanism

Switch to alternative agent, simpler task, or default response when all else fails. Ensures system doesn't break completely.

Use when: Max retries exhausted, critical path needs completion

🔍

4. Critique & Refinement

Agent critiques its own output, identifies issues, then regenerates improved version. Self-correction loop.

Use when: High quality needed, agent capable of self-assessment

📊

5. Logging & Monitoring

Track all failures with context for analysis. Essential for debugging patterns and improving system over time.

Use always: Required for production systems, enables continuous improvement

Error Handling Strategy Selection Matrix

Scenario

Recommended Strategy

Why

Format error (invalid JSON)

Re-prompt with feedback

Clear, fixable error that agent can correct

Low confidence score

Retry with temperature adjustment

Sampling variation might improve confidence

Factual inaccuracy

Critique & refinement

Needs reasoning about correctness, not just format

Timeout or API error

Retry with exponential backoff

Transient infrastructure issue

Max retries exhausted

Fallback + log

Must complete workflow, track for analysis

Ambiguous input

Fallback to clarification request

Need more information from user

# Example: Comprehensive Error Handling

def execute_with_error_handling(agent, input_data, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            # Execute agent
            output = agent.run(input_data)
            
            # Validate output
            is_valid, error_msg = validate_output(output)
            
            if is_valid:
                log_success(agent, attempt)
                return output
            
            else:
                # Strategy: Re-prompt with feedback
                log_failure(agent, attempt, error_msg)
                
                if attempt < max_attempts - 1:
                    # Add error feedback to prompt
                    input_data = add_feedback(input_data, error_msg)
                    continue
        
        except Exception as e:
            log_exception(agent, attempt, e)
            
            if attempt < max_attempts - 1:
                continue
    
    # All attempts failed - use fallback
    log_fallback(agent, input_data)
    return get_fallback_response(input_data)

⚠️ Set Maximum Retry Limits

ALWAYS define maximum retry attempts (typically 2-5) to prevent infinite loops and runaway costs. Balance quality improvement against time and expense. Track retry rates to identify systemic issues.

Part 3

Context Management Techniques

Optimize information flow to maintain quality while preventing context overload

The Context Challenge

In chained workflows, context accumulates. Too little context and agents lose critical information. Too much and performance degrades due to attention dilution and increased latency.

Context Flow Strategies

Full Context
(All Previous Steps)

Selective Context
(Only Relevant Info)

↓

❌ Context Overload

Slow, expensive, diluted attention

✓ Optimized

Fast, focused, effective

🎯

Selective Passing

Only pass information relevant to the next step. Filter out intermediate data that doesn't contribute to downstream decisions.

Extract only necessary fields
Summarize long content
Remove temporary variables
Pass final results, not all steps

🔁

Contextual Reiteration

Repeat critical context in each prompt to ensure agents don't "forget" essential information as chain lengthens.

Restate key objectives
Include important constraints
Remind of output format
Maintain persona consistency

⚖️

Balance Optimization

Find the sweet spot between too much and too little context through testing and monitoring.

A/B test context levels
Monitor performance vs. context size
Track where agents fail
Adjust based on results

# Example: Selective Context Passing

def chain_execution(steps, input_data):
    context = {
        "original_request": input_data,  # Always preserve
        "step_results": {}
    }
    
    for i, step in enumerate(steps):
        # Selective: Only pass relevant prior context
        relevant_context = extract_relevant_context(
            context,
            step.dependencies
        )
        
        # Reiteration: Include critical constraints
        prompt = build_prompt(
            step.instruction,
            relevant_context,
            critical_constraints=context["constraints"]
        )
        
        # Execute step
        result = step.execute(prompt)
        
        # Store result but don't pass everything forward
        context["step_results"][i] = result
        
        # Balance: Summarize if context getting large
        if len(context["step_results"]) > 3:
            context["summary"] = summarize_results(
                context["step_results"]
            )
    
    return context["step_results"][-1]  # Return final output

💡 Context Window Strategy

Start narrow, expand if needed: Begin with minimal context and add only when agents fail due to missing information. This prevents the common mistake of over-contexting from the start, which wastes tokens and degrades performance.

✓ Context Management Checklist

Identify dependencies: What does each step actually need?
Define critical context: What must persist across all steps?
Implement summarization: Condense information at key points
Test minimal context: Start lean, expand only when necessary
Monitor context size: Track token usage and performance
Document context flow: Clear specs for what passes between steps

Part 4

Validation Placement by Pattern

Where and how to implement validation for each of the 5 workflow patterns

Pattern-Specific Validation Strategies

Each workflow pattern has unique validation requirements and optimal placement points for quality gates.

🔗

Prompt Chaining

Critical Challenge: Error propagation through sequential steps

After every step: Validate before passing to next agent
At decision points: Check branching logic correctness
Before final output: Comprehensive quality check
Context size: Monitor and summarize if growing too large

Recommended Approach:

Programmatic validation after each step (fast), LLM-based validation only at critical junctions or final output (expensive but thorough).

🔀

Routing

Critical Challenge: Misclassification sends task to wrong agent

Classification output: Verify category is valid option
Confidence threshold: Flag low-confidence routing decisions
Reasoning check: Ensure classification logic makes sense
Worker output: Validate specialist agent's response

Recommended Approach:

Rule-based validation on classification output (must be valid category), confidence scoring to flag uncertain cases, fallback to general agent if confidence too low.

⚡

Parallelization

Critical Challenge: Inconsistent results from parallel agents

Individual outputs: Basic quality check on each agent
Consistency check: Flag major disagreements between agents
Synthesis validation: Ensure consolidation is coherent
Completeness: Verify all subtasks produced valid outputs

Recommended Approach:

Programmatic checks that all agents completed, LLM-based synthesis validation to ensure combined output is coherent. Consider voting/consensus mechanisms when agents disagree significantly.

🔄

Evaluator-Optimizer

Critical Challenge: Evaluation loop runs too long or doesn't converge

Generated output: Validate each iteration's output
Evaluator consistency: Check evaluator criteria are applied properly
Improvement tracking: Verify output is actually getting better
Max iterations: Hard limit prevents infinite loops

Recommended Approach:

LLM-based evaluation with clear rubric, track improvement scores across iterations, stop if no improvement for 2 consecutive iterations OR max attempts (3-5) reached. Always set maximum iterations.

🎯

Orchestrator-Workers

Critical Challenge: Dynamic planning can diverge or loop infinitely

Plan validity: Verify orchestrator's strategy makes sense
Worker selection: Check chosen workers are appropriate
Progress tracking: Ensure system is making forward progress
Total steps limit: Cap maximum workflow complexity

Recommended Approach:

LLM-based plan validation, programmatic checks on worker selection logic, track state changes to detect loops, set maximum total steps (e.g., 10-20) to prevent runaway orchestration.

✅ Universal Validation Principles

Regardless of pattern:

Validate at boundaries: Check inputs and outputs of each major component
Fail fast: Catch errors as early as possible in the workflow
Provide actionable feedback: Error messages should enable fixes
Log everything: Track all validation results for analysis
Set hard limits: Maximum iterations, timeouts, context sizes

🎯 Production Readiness Checklist

Essential Implementation Steps

Multi-layer validation: Combine programmatic, rule-based, and LLM validation
Error handling strategy: Define retry logic, fallbacks, and escalation paths
Context optimization: Implement selective passing and summarization
Maximum limits: Set retry caps, iteration limits, timeout values
Logging infrastructure: Track all inputs, outputs, validation results, errors
Pattern-specific checks: Implement validation appropriate to workflow pattern
Monitoring dashboards: Track error rates, retry rates, performance metrics
Continuous improvement: Analyze logs to refine prompts and validation criteria

Remember: Validation and error handling aren't optional—they're what separate prototype demos from production-ready systems. Invest the time upfront to prevent costly failures later.