Agentic Workflow Design Playbook

Phase 1

Start with a Deterministic Process

Begin with a well-documented, fixed workflow that you understand completely

Why Start Here?

Every agentic workflow begins as a deterministic process. Before adding intelligence and flexibility, you need a solid foundation of understanding what the workflow should accomplish and how it currently works.

Typical Deterministic Workflow

Input

→

Step 1

→

Step 2

→

Step 3

→

Output

✓ Phase 1 Checklist

Document the current process: Write down every step in detail
Identify inputs and outputs: What data goes in? What comes out?
Map dependencies: Which steps must happen in order?
Note decision points: Where do choices get made?
Understand success criteria: How do you know it worked?

Example: Customer Support Ticket Routing

Current Deterministic Process:

Ticket arrives via email
Check subject line for keywords ("billing", "technical", "sales")
Route based on keyword match
If no keyword match, send to general queue
Assigned agent receives notification

Phase 2

Identify Opportunities for "Agenticness"

Find where intelligence, flexibility, and adaptation would add value

Where Can Agents Help?

Look for these patterns in your deterministic workflow that signal opportunities for agentic enhancement:

🔍 Agenticness Indicators

Manual interpretation required: Steps that need human judgment
Context-dependent decisions: Choices that vary based on situation
Complex pattern matching: Beyond simple keyword detection
Need for reasoning: Tasks requiring understanding, not just rules
Variable inputs: Data that comes in unpredictable forms
Quality assessment: Subjective evaluation of outputs

Deterministic (Rule-Based)

Agentic (LLM-Powered)

Keyword matching in subject line

Semantic understanding of ticket content

Fixed routing rules

Intelligent classification based on context

One-size-fits-all responses

Personalized, context-aware replies

Breaks on edge cases

Adapts to unexpected situations

✓ Phase 2 Checklist

Identify brittle steps: Where does the process frequently break?
Find interpretation needs: Which steps require understanding meaning?
Spot adaptation opportunities: Where would flexibility help?
Note quality issues: Where do outputs need improvement?
Consider similar processes: Could this generalize to other workflows?

Example: Enhancing Ticket Routing

Agentic Opportunities Identified:

Better Classification: LLM can understand actual issue beyond keywords ("my payment didn't go through" → billing, even without "billing" keyword)
Urgency Detection: Agent can identify urgent tone and prioritize
Context Extraction: Pull relevant customer history and product details
Quality Routing: Match complex issues to experienced agents

Phase 3

Select the Right Workflow Pattern

Choose the pattern that best fits your problem structure and requirements

Pattern Selection Decision Tree

Use these questions to determine which pattern(s) to apply:

What is your primary challenge?

🔗 Sequential Steps

Tasks must happen in order, each building on the previous → Chaining

🔀 Diverse Requests

Different types of inputs need specialized handling → Routing

⚡ Independent Tasks

Subtasks can run simultaneously without dependencies → Parallelization

🔄 Need Iteration

Output quality improves through refinement → Evaluator-Optimizer

🎯 Unknown Path

Solution requires dynamic planning and adaptation → Orchestrator-Workers

⚠️ Important: Patterns Can Combine

Real-world systems often use multiple patterns together. For example:

Routing + Chaining: Route to specialized agent, then chain steps
Parallelization + Evaluator: Generate multiple options in parallel, then evaluate and refine best one
Orchestrator + All Patterns: Orchestrator dynamically uses any pattern as needed

✓ Phase 3 Checklist

Analyze task structure: Sequential, parallel, or dynamic?
Consider input diversity: One type or many?
Evaluate quality needs: Is iteration valuable?
Assess predictability: Known path or exploratory?
Plan for combinations: Will you need multiple patterns?

Example: Ticket Routing Pattern Selection

Chosen Pattern: Routing (with optional Chaining)

Why Routing: Different ticket types (billing, technical, sales) need specialized agents
Why not Parallelization: Only need one response, not multiple perspectives
Why not Orchestrator: Path is predictable once classified
Enhancement: Could add Chaining after routing (classify → extract context → route → generate response)

Phase 4

Design Your Agents

Define persona, knowledge, prompting strategy, tools, and interaction for each agent

The 5 Agent Components

For each agent in your workflow, you need to specify:

Agent Design Template

🎭 Persona
Identity & Role

📚 Knowledge
Information Sources

💬 Prompting
Strategy & Style

🔧 Tools
Execution Abilities

🔄 Interaction
Input/Output

💡 Design Guidelines

Clear, specific personas: "Expert billing specialist" not "helpful assistant"
Relevant knowledge: Only what the agent needs for its specific role
Appropriate prompting: Few-shot examples for consistency, CoT for reasoning
Minimal necessary tools: Don't overload agents with capabilities they won't use
Structured interaction: Define clear input/output formats (JSON, specific fields)

✓ Phase 4 Checklist (Per Agent)

Define persona: Write system prompt with role, expertise, boundaries
Specify knowledge: What data sources does this agent need?
Choose prompting approach: Zero-shot, few-shot, CoT, ReAct?
List required tools: APIs, databases, calculations, other agents?
Design I/O format: How will data flow in and out?

Example: Router Agent Design

Agent Specification:

Persona: "Expert customer support classifier with deep product knowledge"
Knowledge: Product documentation, common issues database, team specialties
Prompting: Few-shot classification with examples of each category
Tools: Customer history lookup, urgency scoring function
Interaction: Input: ticket text + metadata; Output: JSON with category, urgency, reasoning

Phase 5

Implement Validation & Error Handling

Build quality gates and error recovery to ensure reliable outputs

Why Validation Matters

LLMs are powerful but unpredictable. Validation prevents errors from propagating through your workflow and ensures consistent quality.

Validation Points in Workflow

Agent Output

→

✓ Validate

→

Pass → Next Step

Fail → Retry/Fix

🛡️ Validation Strategies

1. Programmatic Checks: Verify format, length, required fields, data types

2. LLM-based Validation: Another agent evaluates quality, accuracy, relevance

3. Rule-based Validation: Check against business rules and constraints

4. Confidence Scoring: Use model confidence to flag uncertain outputs

⚠️ Error Handling Strategies

When validation fails:

Retry: Run the same prompt again (works for randomness issues)
Re-prompt with feedback: Include validation errors in revised prompt
Fallback: Use default response or escalate to human
Critique & refine: Agent critiques its own output and improves
Log for analysis: Track failures to improve prompts over time

✓ Phase 5 Checklist

Identify critical outputs: Which results must be validated?
Define success criteria: What makes an output "good"?
Choose validation methods: Programmatic, LLM-based, or rules?
Set retry limits: Maximum attempts before fallback
Design fallback paths: What happens when all retries fail?
Implement logging: Track validation results for improvement

Example: Router Validation

Validation Implementation:

Programmatic: Check that category is one of ["billing", "technical", "sales"]
Programmatic: Verify urgency score is 1-5
Programmatic: Ensure reasoning field is not empty
LLM-based: Evaluator agent checks if reasoning aligns with category
Fallback: If validation fails 3 times, route to general queue with flag for human review

Phase 6

Implement, Test, and Iterate

Build your system in code, test thoroughly, and refine based on results

Implementation Structure

A typical Python implementation follows this structure:

# 1. Define Agent Classes
class Agent:
    def __init__(self, persona, knowledge, tools):
        self.persona = persona
        self.knowledge = knowledge
        self.tools = tools
    
    def run(self, input_data):
        # Construct prompt with persona + knowledge + input
        prompt = self.build_prompt(input_data)
        response = self.call_llm(prompt)
        return self.process_response(response)

# 2. Implement Workflow Pattern
class RoutingWorkflow:
    def __init__(self, router, workers):
        self.router = router
        self.workers = workers
    
    def execute(self, input_data):
        category = self.router.run(input_data)
        worker = self.workers[category]
        result = worker.run(input_data)
        return result

# 3. Add Validation Layer
def validate_output(output, criteria):
    for check in criteria:
        if not check(output):
            return False
    return True

Testing Strategy

Comprehensive testing is critical for production readiness:

🧪 Testing Approach

Unit Tests: Test individual agents with known inputs

Integration Tests: Test complete workflow end-to-end

Edge Case Tests: Malformed input, ambiguous cases, boundary conditions

Performance Tests: Latency, throughput, resource usage

A/B Testing: Compare against baseline or alternative approaches

⚠️ Common Implementation Pitfalls

Context overload: Passing too much information to LLM, degrading performance
Infinite loops: Forgetting to set maximum iterations in evaluator-optimizer patterns
Poor error messages: Not logging enough detail to debug failures
Hardcoded values: Magic numbers and strings that should be configuration
No monitoring: Running in production without observability

✓ Phase 6 Checklist

Implement agent classes: Code the 5 components for each agent
Build workflow orchestration: Connect agents according to pattern
Add validation logic: Implement checks and error handling
Create test cases: Cover happy path and edge cases
Add logging/monitoring: Track performance and failures
Run production pilot: Test with real data at small scale
Iterate based on results: Refine prompts, add validation, adjust patterns

Example: Iterative Improvement

Iteration Cycle:

V1: Basic routing with keyword matching (85% accuracy)
V2: Add LLM classification (92% accuracy, but slow)
V3: Add few-shot examples to prompts (95% accuracy)
V4: Implement evaluation agent for edge cases (97% accuracy)
V5: Add customer context to improve personalization (98% accuracy + better satisfaction)