Building robust, production-ready agentic systems through comprehensive validation, error recovery, and context management
Choose the right validation approach based on your quality requirements and constraints
# Example: Multi-Layer Validation
def validate_output(output, config):
# Layer 1: Programmatic - Structure
if not is_valid_json(output):
return False, "Invalid JSON structure"
data = json.loads(output)
# Layer 2: Programmatic - Required fields
required_fields = ["category", "confidence", "reasoning"]
if not all(field in data for field in required_fields):
return False, "Missing required fields"
# Layer 3: Rule-Based - Business logic
if data["category"] not in config.allowed_categories:
return False, "Invalid category"
# Layer 4: Confidence - Threshold check
if data["confidence"] < config.min_confidence:
return False, "Low confidence score"
# Layer 5: LLM-Based - Quality check
quality_score = llm_evaluate_quality(data, config.criteria)
if quality_score < config.quality_threshold:
return False, "Failed quality evaluation"
return True, "Validation passed"
Recover gracefully when validation fails or agents produce errors
# Example: Comprehensive Error Handling
def execute_with_error_handling(agent, input_data, max_attempts=3):
for attempt in range(max_attempts):
try:
# Execute agent
output = agent.run(input_data)
# Validate output
is_valid, error_msg = validate_output(output)
if is_valid:
log_success(agent, attempt)
return output
else:
# Strategy: Re-prompt with feedback
log_failure(agent, attempt, error_msg)
if attempt < max_attempts - 1:
# Add error feedback to prompt
input_data = add_feedback(input_data, error_msg)
continue
except Exception as e:
log_exception(agent, attempt, e)
if attempt < max_attempts - 1:
continue
# All attempts failed - use fallback
log_fallback(agent, input_data)
return get_fallback_response(input_data)
Optimize information flow to maintain quality while preventing context overload
In chained workflows, context accumulates. Too little context and agents lose critical information. Too much and performance degrades due to attention dilution and increased latency.
# Example: Selective Context Passing
def chain_execution(steps, input_data):
context = {
"original_request": input_data, # Always preserve
"step_results": {}
}
for i, step in enumerate(steps):
# Selective: Only pass relevant prior context
relevant_context = extract_relevant_context(
context,
step.dependencies
)
# Reiteration: Include critical constraints
prompt = build_prompt(
step.instruction,
relevant_context,
critical_constraints=context["constraints"]
)
# Execute step
result = step.execute(prompt)
# Store result but don't pass everything forward
context["step_results"][i] = result
# Balance: Summarize if context getting large
if len(context["step_results"]) > 3:
context["summary"] = summarize_results(
context["step_results"]
)
return context["step_results"][-1] # Return final output
Where and how to implement validation for each of the 5 workflow patterns
Each workflow pattern has unique validation requirements and optimal placement points for quality gates.
Recommended Approach:
Programmatic validation after each step (fast), LLM-based validation only at critical junctions or final output (expensive but thorough).
Recommended Approach:
Rule-based validation on classification output (must be valid category), confidence scoring to flag uncertain cases, fallback to general agent if confidence too low.
Recommended Approach:
Programmatic checks that all agents completed, LLM-based synthesis validation to ensure combined output is coherent. Consider voting/consensus mechanisms when agents disagree significantly.
Recommended Approach:
LLM-based evaluation with clear rubric, track improvement scores across iterations, stop if no improvement for 2 consecutive iterations OR max attempts (3-5) reached. Always set maximum iterations.
Recommended Approach:
LLM-based plan validation, programmatic checks on worker selection logic, track state changes to detect loops, set maximum total steps (e.g., 10-20) to prevent runaway orchestration.