Skip to main content

Command Palette

Search for a command to run...

Agentic Design Patterns

A Hands-On Guide Summary

Updated
32 min read
Agentic Design Patterns
D

I am developer/code-reviewer/debugger/bug-fixer/architect/teacher/builder from dubai, uae

I've been working through Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems and here are my cliff notes on the core patterns, frameworks, and techniques. A good book that covers the current state of agentic usage. This isn't exhaustive, it's a practical reference of what actually matters when you're building agent systems. Whether you're just starting out or scaling multi-agent systems, these patterns are the building blocks.

1. Prompt Chaining (Pipeline Pattern)

  • Reduces drift and hallucinations over single prompt
  • Break down complex query into specialized, specific parts
  • Subsequent query depends on previous query output
  • Output has to be well-structured (crucial)
  • Each query:
    • Runs as a unit of work
    • Can run sub-queries in parallel
    • Can enrich with external sources
    • Can have conditional queries
    • Can refine queries and ask follow-up questions
  • Use cases: Information Processing workflows, Complex Query Answering, ETL, Content Generation, Conversational agents w/State, Code generation and refinement, Multimode and Multi-step Reasoning

Context/Prompt Engineering is designing, constructing, and delivering a complete informational environment to the agent before token generation. It differs from traditional prompt engineering which tries to craft the perfect prompt. The context is the main component to ensure the agent knows the user intent, history, and current environment.

Use for complex tasks that can be broken down into distinct, linear steps with processing stats that can interact with tools and refine results for the next step.

2. Routing

  • Introduces conditional logic to select from a set of subsequent actions based on environment, user input, or proceeding results
  • Routing happens via:
    1. LLM-Based (prompt)
    2. RAG/ML-Classifier based (semantic)
    3. Rule Based (extracted data and deterministic)
  • Routing provides the capacity for logical arbitration moving from static executor of predefined sequences to a dynamic system that can make decisions about the most effective way to complete a task under changing conditions
  • Use when an agent must decide between multiple distinct workflows, tools, or subagents based on classifying a variety of user inputs

3. Parallelization

Allows running independent tasks in parallel or concurrently more efficiently:

  1. Information Gathering & research (e.g., Researching a company)
  2. Data processing & Analysis (e.g., analyze customer feedback)
  3. Multi-API or tool interaction (e.g., A travel planning agent)
  4. Content Generation w/Multiple components (e.g., creating a marketing email)
  5. Validation & verification (e.g., independent checks or validations concurrently)
  6. Multi-Modal Processing (e.g., analyzing social media post w/text & images)
  7. A/B Testing or Multiple Options Generation (e.g., generating different creative text options)

4. Reflection

An agent evaluates its own work, output, and internal state to improve its performance or refine its response. It adjusts its approach based on feedback, internal critique, or comparison against desired criteria. This can be wrapped in a separate agent to look at the output of an agent.

  • It allows for self-correction
  • The quality, accuracy, and detail of the output is more important than speed/cost
  • It follows the Generator-Critic or Producer-Reviewer model using separate roles for different concerns to robustly prevent cognitive bias
  • A feedback loop is introduced:
    1. Execution: generate an initial output
    2. Evaluation/Critique: Rule/LLM analysis for factual coherence, style, completeness, adherence to instructions, or other relevant criteria
    3. Reflection/Refinement: Based on the previous step, refine the output; adjust parameters of the next steps; modify the overall plan
    4. Iteration: Optional re-execution of the refining of the output or adjusted approach repeating until a satisfactory result/stopping condition

Use Cases:

  1. Creating writing & content generation (e.g., write a blog post)
  2. Code generation and debugging
  3. Complex problem solving (e.g., solving a logic puzzle)
  4. Summarization and information synthesis (e.g., summarize a long document)
  5. Planning & Strategy (e.g., plan a series of actions to achieve a goal)
  6. Conversational Agents (e.g., customer support chatbot)

5. Tool Use (Function Calling)

  • Closes the gap of an LLM's reasoning capabilities with external functionalities by providing access to up-to-date information, ability to calculate or trigger real-world actions
  • Agents work as orchestrators across a diverse ecosystem of digital resources and other intelligent entities
  • The process involves:
    1. Tool definition: description, capabilities, typed parameters
    2. LLM decision: based on the user's request and available tools
    3. Function call generation: JSON output of tool + arguments to make the call
    4. Tool execution: the agent orchestration layer intercepts the structured output, identifies the tool, and executes the function with the provided args
    5. Observation/result: returns to the agent
    6. LLM Processing (optional common): uses result output as context to finalize the response or deciding on the next step (reflection)

6. Planning Pattern

  • Use only with loose constraints to allow for dynamic solutions to complex problems that can't be handled by a single action or tool
  • Don't use for solved, well-understood problems that are repeatable and procedural
  • It provides a logical framework beyond reactive actions to goal-oriented behavior using a coherent sequence of interdependent operations
  • Agents use a structured approach and create a coherent plan, decomposing a high-level objective into a sequence of smaller actionable steps handled in a logical order

7. Multi-Agent Collaboration

  • Predicated on the principle of task decomposition that's handled by a cooperative ensemble of specialized agents
  • The critical component relies on the mechanisms of inter-agent communication
  • Multi-agent collaborations allow agents to achieve goals aligned with the overall objective through:
    1. Sequential handoffs
    2. Parallel processing
    3. Debate and consensus
    4. Hierarchical Structures
    5. Expert teams
    6. Critic-Reviews

Use cases: when a complex problem can be decomposed into manageable subproblems that require specialized solvers

  • Complex research and analysis (e.g., sequential agents working on a research project: searcher; summarizer; trend identifier; synthesizer)
  • Software Development (e.g., collaborative agent gathering requirements, passing it to a code generator and tester critic-reviewer group that outputs to a document generator)
  • Creative content generation (e.g., a collaborative group of a market research agent, a copywriter, a graphic designer, and a social media scheduling agent to create a marketing campaign)
  • Financial Analysis (e.g., a collaborative group of a stock fetcher, news sentiment analyzer, technical analyzer to generate investment recommendations)
  • Customer support escalation (e.g., frontline ingestion agent that delegates to an appropriate specialist leveraging sequential handoff to solve issues)
  • Supply Chain Optimization (e.g., agents represent nodes of the chain—suppliers, manufacturers, distributors—collaborating to improve output efficiency)
  • Network analysis and remediation (e.g., aid with failure attribution by collaborating to triage and remediate issues suggesting optimal actions)

Agents relate and communicate in a variety of ways:

  1. Solo agent
  2. Network agents: decentralized interaction fosters resilience with challenges of decision-making coherence and communication overhead
  3. Supervisor: coordinator that allocates tasks to resolve conflicts of a group of subordinates. Introduces a single point of failure that could get overwhelmed
  4. Supervisor as a tool: less about control, more of a facilitator
  5. Hierarchical: multiple levels of supervisors over a collection of agents at the lowest tier allowing for distributed decision making within well-defined boundaries
  6. Custom: an amalgamation of hybrid communication approaches or novel ones to solve specific use cases

8. Memory Management

An agent's ability to retain and utilize past interactions, observations, and learning experiences so they are stateful. Allows maintenance of conversation context and improvement over time.

  • Short-term memory is the context window and is ephemeral
  • Long-term memory is persistent memory repository like knowledge graphs, vector stores, or databases that's queryable

Use cases:

  1. Chatbot and conversation AI: short-term is per chat, long-term allows recall of preferences, prior discussions to allow for personalized, relevant, and continuous interactions
  2. Task-Oriented Agents: short-term memory to track previous steps, progress, and goals. Long-term is used to recall user-related data
  3. Personalized Experiences: long-term to retrieve user's preferences, past behavior, and personal information
  4. Learning and improvement: use reinforcement learning to store learned strategies or knowledge gleaned from successes and mistakes in long-term memory
  5. RAG: access long-term vector stores of documents or data to inform responses
  6. Autonomous systems (e.g., self-driving car uses short-term memory for immediate surroundings and long-term memory for general environmental knowledge)

ADKs use a Session for an individual chat thread. Session.state is data relevant to current active chat thread. Memory is a data repository sourced from various past chats or external sources. Session/Memory services manage data storage, retrieval, and lifecycle. Developers interact through these.

Long-term memory can be:

  1. Semantic: remember facts that ground responses leading to more personal and relevant interactions
  2. Episode: remember experiences using past events or actions recording successes, implemented as a few-shot prompt of exemplars for accomplishing a task
  3. Procedural: remembering rules of how to perform tasks, part of system prompt allowing for change via reflection using recent interactions and instructed to refine its existing instructions

9. Learning and Adaptation

Allows for independent agents that optimize performance without constant manual intervention by changing their thinking, actions, or knowledge based on new experience or data. Adaptive agents exhibit enhanced performance in variable environments through iterative updates driven by experiential data.

Learning can be done by: reinforcement; supervised; unsupervised; few/zero-shot with LLM-based agents; online; memory-based.

Update strategies of LLM models are done via:

  • Proximity Policy Optimization (PPO) is a training strategy to reliably and stably improve an agent's decision-making strategy in reinforcement learning. It avoids drastic changes by making careful updates which is measured against a surrogate goal of expected value. It uses a clipping mechanism using a safe zone that isn't too different from the current strategy which won't necessarily maximize the expected value but will minimize the risk of undoing its learnings. The process works by:
    • Train a separate reward model based on feedback on the observed model which is used to predict the score of future feedback responses
    • The observed model is fine-tuned with PPO to maximize possible scores from the judgment reward model in the training game
  • Direct Preference Optimization skips the reward model and mathematically and directly links preference data to an optimal policy of getting preferred responses for a more robust and efficient alignment process. It avoids complexity and potential instability of reward model loop

Use Cases:

  • Personal agents: that refine interactions by analyzing identified behaviors historically
  • Trading bots: high-resolution view of real-time data that's analyzed to adjust parameters of the decision model to maximize profit and mitigate risk
  • Application agents: optimize UX and functionality to increase engagement and system intuitiveness on observed behavior
  • Robotic and autonomous vehicle agents: improve navigation and response across diverse environments by factoring integrated sensor data and historic action analysis
  • Fraud detection agents: improve anomaly detection by refining predictive models improving security and minimize financial loss by detecting newly identified fraudulent patterns
  • Recommendation agents: improve selection precision of individual but context-relevant content via preference-learning agents
  • Game AI agents: dynamically adapt strategies to enhance player engagement by increasing gameplay challenge
  • Knowledge-based learning agents: reference RAG solution sets of problems during decision making to adapt to new but similar situations using historical successes

10. Model Context Protocol

A protocol that facilitates consistent and predictable integration of LLMs with external sources. Comprised of resources (data), interactive templates (prompts), and tools (actionable functions) which are used by MCP clients.

  • Clients are the host application or even agents which use an MCP contract that's essentially an agentic interface
  • Effective connectors should wrap deterministic APIs completely to service non-deterministic agent work. It's better if the content is in an understandable textual form or can be parsed into it for better compatibility
  • Tool calling is a direct 1-to-1 request to a specific function and is usually proprietary by LLM provider
  • MCP is an interface for LLM to JIT discover, communicate with, and use external capabilities to foster an interactive ecosystem of LLMs and tools
  • Wrapping existing assets with this protocol increases information interoperability where the federated model is composed of legacy content orchestrated by LLMs
  • Local communication uses JSON RPC over STDIO for efficient, and remote interactions leverage web-friendly SSE or streaming HTTP for persistent but efficient client-server communication
  • The client-server is the information flow. MCP clients wrap around an LLM that's an intermediary translating formal requests to that MCP standard where it's responsible for discovery and connection to MCP servers
  • MCP domain-specific servers expose themselves to authorized clients
  • Optional third-party services are connections managed by the server to access content to service other client requests

Interaction flow:

  1. Discovery - MCP client asks server what capabilities: response - manifest: tools; resources; prompts
  2. Request formulation: pick a tool with right parameters
  3. Client communication: LLM-formulated request sent as standardized call to the appropriate server
  4. Service execution: authenticates client, validates request, executes action by function call
  5. Server response: send standardized response of status + relevant output
  6. Client context update: pass result back to LLM that updates its context and proceeds to the next task

Use cases: DB integrations; Gen Media Orchestration (complex workflow orchestration); External API Interaction; Reasoning-based information extraction (relevant excerpts of whole documents based on query); Custom tool development w/standardized LLM-to-App communication (FastMCP bolt-on); IoT device control; Financial services automation

11. Goal Setting and Monitoring

Set (SMART) goals with tracking capabilities of the agent. Generate (sub/autonomously) intermediate steps (sequential/complex) with feedback loops for an effective plan based on training data and understanding of tasks.

Use Cases:

  • Customer support automation. Goal: resolve billing inquiry. Success: confirm billing change and positive customer feedback. Methods: monitor conversation, database access, billing tools
  • Personalized learning systems: Goal: improve student understanding of subject. Success: positive trend of metrics; Methods: material progress tracking, tools to alter teaching materials, performance tracking
  • Project management systems. Goal: project completion date; Methods: monitors task status, team communication, resource availability, flagging delays and suggesting corrective actions
  • Automated trading bots: Goal: Maximize portfolio gains within risk tolerance; Methods: monitor market data, current portfolio value and risk indicators, trade execution
  • Robotics/Autonomous Vehicles: Goal: transport passengers safely from A to B. Methods: environment monitoring, self-state (speed, location), progress monitoring, driving controls to adapt to goal
  • Content Moderation. Goal: identify and remove harmful content from platform. Method: monitor incoming content, apply classification, track metrics (false positives/negatives), adjust filtering criteria, notify human

12. Exception Handling and Recovery

Proactive preparation and reactive strategies to detect failure, adapt, or ensure controlled failure to maintain uninterrupted functionality in unpredictable settings to boost stability, effectiveness, trustworthiness, and confidence. Integrating with monitoring and diagnostic tools to identify problems, also adding reflection to analyze failures to refine re-attempts to try and self-resolve errors.

Exception Handling and Recovery pattern: detect, handle, recover.

  • Customer service chatbots: customer database is down, detect the API failure, instruct user to try again later or escalate to a human
  • Financial trading bot: failure to execute because of lack of funds or market close, don't retry, log and notify the user
  • Home automation: device offline or broken, retry, if unsuccessful log and notify user for manual intervention
  • Data processing agents: corrupted batch file, skip it, continue with rest, log failure and report to the user skip one partially succeeding
  • Web scraping agents: encounters CAPTCHA or 503. Handle with grace, pause/retry or report failing URL
  • Robotics: failure to pick up, detect failure, adjust and retry. If not successful alert human

13. Human in the Loop

Deliberate interweaving of superior human judgment with efficiency of AI for critical tasks which have significant safety, ethical, or financial consequences or have a certain level of ambiguity. Use also for training AI with labeled data or refining generative outputs.

  • AI augments human capabilities for collaborative decisions leveraging individual strengths
  • Human Roles: overseer, corrector, feedback (for training), decider (using AI data summaries), collaborator, escalation responder
  • Caveats: Humans reduce scalability; effectiveness depends on human expertise; Anonymizing data for operator overview

Use cases (ambiguities are escalated): content moderation; autonomous driving; fraud detection; legal document review; customer support; data labeling; generative AI refinement; autonomous networks

Human-On-the-Loop variation specifies broad parameters that constrain AI to execute high volume of task below thresholds which trigger human intervention like financial or call center bots.

14. Knowledge Retrieval (RAG)

Closes the knowledge gap between trained and current/external/context-specific queryable data allowing them to ground their actions for more accurate and up-to-date responses. The intent of a query is extracted and relevant chunks of documents are collated based on that which are added to the context and sent to the LLM.

The use of relevant documents provides verifiability via citations and reduces the risk of hallucinations enhancing trustworthiness.

RAG Concepts:

  • Embeddings (text → tokens → numerical vectors)
  • Text Similarity (distance between two vectors)
  • Semantic Similarity and Distance (similarity = 1/vector distance)
  • Chunking (split documents into blocks) allow for faster/relevant retrieval
  • Retrieval (BM25 keyword on contextual documents retrieved)
  • Vector Databases (semantic matching vectorized query with content using efficient Hierarchal Navigable Small World closest search)

RAG's Challenges:

  • Total content is spread across chunks
  • Inaccurate retrieval and introduce noise causing hallucinations
  • Preprocessing of content and ready for RAG and keeping it up to date is cumbersome

Graph RAG uses knowledge graphs that link data nodes related by labeled edges to allow coalescing a contextual answer from several data fragments; The cost and quality depend on maintaining a high-quality graph which is slower than traditional RAG.

Agentic RAG introduces a reasoning/refiner agent that reconciles retrieved information to ensure a more accurate and trustworthy final response. It could filter/augment content based on relevancy, accuracy, authoritative sources, reasoning, and close knowledge gaps at the expense of cost, latency, and complexity.

Use cases: Enterprise search / Q&A; helpdesk; recommenders; summarizers

15. Inter-Agent Communication (A2A)

Framework-agnostic and diverse agent collaboration via standard protocols.

Concepts:

  1. Core Actors: Client (requestor); Server (provides a service)
  2. Capabilities
  3. Discovery: (Defined) well-known URLs (standard); curated registry; direct configuration (embedded/private)
  4. Communications: stateful, async tasks via messages that generate resultant artifacts in parts for retrieval and can be grouped by contextId over HTTPS w/JSON-RPC
  5. Interaction Mechanisms: Polling (via setTask), SSE (streaming via sendTaskSubscribe), Sync request, Webhooks (callbacks)
  6. Security: TLS, audit logs, tokens, API keys

A2A vs MCP:

  • Enhance coordination/communication between agents vs tools
  • Foster innovation and interoperability in complex multi-agent systems

Use cases: multi-framework collaboration; automated workflow orchestration; dynamic information retrieval

16. Resource-Aware Optimization

Action sequencing within a budget leveraging cost vs. speed/accuracy with a fallback strategy for graceful degradation.

Use cases: cost-optimized LLM usage; latency-sensitive operations; energy efficiency; fallback for service reliability; data usage management; adaptive task allocation

Agent resource optimization:

  • Dynamic model switching
  • Adaptive tool use and selection
  • Contextual pruning and summarization
  • Proactive resource prediction
  • Cost-sensitive exploration
  • Energy-efficient deployment
  • Parallelization and distributed computing awareness
  • Learned resource allocation policies
  • Graceful degradation and fallback

17. Reasoning Techniques

Explicit internal reasoning beyond sequential operations via allocation of increased computing resources during inference with a bigger time budget for iterative refinement and exploration to enhance accuracy, coherence, and robustness for complex problems that require deeper analysis and deliberation.

Use Cases:

  • Complex question answering: multi-hop queries, multi-data source integration, logical deductions from multiple reasoning paths
  • Mathematical problem solving: decompose to solvable issues in step-by-step fashion and using extra time to generate intricate code for precise results and validate it
  • Code debugging and generation: through debug cycles analyzing agent's analysis and rationale for code generation sequentially finding issues and iteratively refining code based on results
  • Strategic planning: develop comprehensive plan from various signals and adjusting plans based on real-time feedback (ReAct)
  • Medical diagnosis: assessing symptoms, test results with patient's history to reach a thorough, differential diagnosis potentially utilizing external data retrieval tools
  • Legal Advice: analysis of legal documents and precedents to formula arguments or provide guidance ensuring logical consistency through self-correction

Reasoning Techniques:

  1. Chain-of-Thought (CoT): instruct the agent to decompose the problem with optional few-shot examples guiding internal processing to a more deliberate logical progression
  2. Tree-of-Thought (ToT): non-linear CoT to allow for branching, evaluating, self-correction, and backtracking to explore multiple solutions before finalizing an answer
  3. Self-correction: a.k.a. self-refinement where agent evaluates intermediate thought processes and generate content for gaps, ambiguities, inconsistencies, or inaccuracies in its understanding of the solution. The review/refine cycle allows adjustment to a more accurate, thorough, high-quality, and reliable response
  4. Program-Aided Language Models (PALMs): allows for deterministic code creation and execution
  5. Reinforcement Learning with Verifiable Rewards (RLVR): CoT with thinking to generate reasoning trajectory learned from labeled examples without supervision
  6. Reasoning and Acting (ReAct): CoT with tools planning incorporating available tools and anticipate an outcome working in interleaved/iterative manner
  7. Chain of Debates (CoD): AI agent council meeting and peer review to leverage collective intelligence
  8. Graph of Debates (GoD): non-linear consensus of most well-supported cluster of arguments
  9. MASS (Multi-Agent System Search): is an optimization of Multi-Agent Systems to optimize: prompt; workflow paths; and overall system prompt tuning of optimized paths
  10. Deep research: with time budgets to create reports with: Initial Exploration; Reasoning and refinement; Follow-ups; Final synthesis

The inference scaling law posits that a smaller model with a bigger thinking budget during inference can occasionally surpass the performance of larger models using a smaller thinking budget constraining the computationally intensive generation process. It balances model size, latency, and operational costs. Bigger isn't always better.

18. Guardrails/Safety Patterns

Safety patterns designed to keep responses useful and ethical without mainly being restrictive to maintain user trust. Can be used for inputs and outputs.

Use cases:

  • CS chatbots - guard against offensive language or off-topic responses and instruct on toxic user responses
  • Content Generation Systems - adhere to prescribed ethical guidelines and legal standards to even post redact content
  • Educational assistants - prevent wrong answers or inappropriate/non-curriculum responses
  • Legal RA - guide to consultation over providing substitutive, definitive legal advice
  • Recruitment - ensure fairness and filter discriminatory language or criteria
  • Social Media Content Moderation - flag hate speech, misinformation, or graphic content
  • Scientific RA - guard against fabricated data or unsupported conclusions with emphasis on empirical validation and peer review

19. Evaluation and Monitoring

Measures agent's: effectiveness; efficiency; compliance w/reqs via metrics, feedback loops, and reporting systems.

Use Cases:

  • Performance tracing: accuracy, latency, and resource consumption
  • A/B testing for agent improvements: parallel performance comparison of versioned agents
  • Compliance and safety: generate audit reports of agents compliance with ethical, regulations, and safety protocols validated by HITL or another agent
  • Enterprise systems: required generated AI contracts that codify objectives, rules, and controls for AI-delegated tasks
  • Drift detection: monitor relevance or accuracy of generated content detect degradations due to concept drift (input data) or environmental shifts
  • Anomaly detection: of unusual or unexpected behavior indicating error, a malicious attack, or emergent undesired behavior
  • Learning progress assessment: tracking learning curve and improvement in skills
  • Agent trajectories: qualitatively measures the agent's non-deterministic response by looking at the steps to make a decision. Test (JSON) files of interactions (turns) with expected tool use, intermediate responses, and final response. Eval set files use a dataset (same name) to evaluate longer interactions simulating complex situations or scenarios measuring the same tool use, responses, and final responses
  • Multi-agent validation: measures cooperations at each stage looking at inputs and outputs as well as a whole
  • Advanced Contractors: formal relationship between AI/User around:
    1. Formalized contract single source of truth of the detailed task detailing deliverables according to specs specifying data sources, scope of work, cost/time constraints being able to objectively verify outcomes
    2. Dynamic negotiations to flag ambiguities or risks and resolve misunderstandings and dependencies to increase the probability of an accurate result
    3. Quality and correctness focused (not time) iterative executions with self-validation and correction until the spec is satisfied
    4. Hierarchical decomposition via subcontracts for delegation by the primary agent

20. Prioritization

Assess tasks based on the criteria definition around subtask significance, urgency, inter-dependency, resources, cost, user preferences to be effective by picking the optimal task and upon completion dynamically prune and reprioritize tasks.

Use Cases:

  • Automated customer support: requests based on high-priority users or major outages
  • Cloud computing: resources allocation to critical applications
  • Autonomous driving: safety over efficiency
  • Financial trading: trades based on set preferences
  • Project Management: tasks based on deadlines, availability, and strategic importance
  • Cybersecurity: alerts based on threat severity, impact, and asset criticality
  • Personal Assistant: events based on user importance, deadlines, and current context

21. Exploration and Discovery

Venture into unfamiliar spaces, experimenting with new approaches and generating new knowledge and understanding. Crucial for open-ended, complex, or rapidly evolving domains that render static/preprogrammed knowledge inefficient.

Use Cases:

  • Scientific research automation: design/run new experiments formulating new hypotheses and novel discoveries
  • Game play and strategy generation: explore game states, emergent strategies, or identify vulnerabilities in environments
  • Market research and trend spotting: scan unstructured social media, news to identify trends, consumer behaviors, or opportunities
  • Security Vulnerabilities: probe code bases for flaws or attack vectors
  • Creative content generation: explore combinations of styles, themes to generate art, music, or literature
  • Personalized Education and training: prioritize learning based on individual's dynamic progress, learning style, or areas of improvement
  • Google Co-Scientist:
    • Supervisor Agent with specialized sub-agents following the iterative: generate, debate, and evolve cycle using test-time scaling (higher resources to reason and enhance output)
    • Sub-agents:
      • Generator: initiator producing hypothesis via data exploration and simulated debate
      • Reflector: peer reviewer on pillars of correctness, novelty, and quality of hypotheses
      • Ranker: Elo-based tournament to rank hypotheses through simulated debate
      • Evolution: refiner of top hypotheses simplifying concepts, generating ideas, and exploring unconventional reasoning
      • Proximity: clusters similar ideas to assist in exploring the hypothesis landscape
      • Meta-review agent: insights from all reviews/debating finding commonalities and providing feedback for continual improvement

22. Advanced Prompting Techniques

Elicit high-quality outputs from the models by understanding capabilities and limitations of them.

Core Prompting Principles:

  1. Clarity and specificity: unambiguous and precise instructions. Define tasks, output, limitations, and requirements w/o vague assumptions
  2. Conciseness: direct, simple with active verbs and authoritative instructions without intricate language or superfluous information
  3. Using Action Verbs: for expected operation. Examples: act, analyze, categorize, classify, contrast, compare, create, describe, define, evaluate, extract, find, generate, identify, list, measure, organize, parse, pick, predict, provide, rank, recommend, return, retrieve, rewrite, select, show, sort, summarize, translate, write
  4. Instructions over constraints: positive instructions over negative constraints specifying desired action over what not to do. Constraints for safety or formatting
  5. Experimentation and iteration: iterative prompt refinement looking at results vs. desired output to tweak the prompt. The model variations and configurations like temperature. Documenting attempts and experimentation is vital

Basic Prompting Techniques:

  • Zero-shot prompting: relying solely on the model's pre-training is the most basic, quickest, with no examples of I/O pairs required. Includes only the task description and initial text. Good for tasks encountered in pre-training like summarization or even translation
  • One-shot prompting: has a single example of Input-Output pair as a template for instructions for expected results. Good for specific outputs
  • Few-Shot prompting: Using several (3-5) examples of I/O pairs to improve results. Good for specific format, styles, or nuanced answer variations when one/few don't work. Relies on the quality and diversity of the examples that nuanced and cover edge cases too. With classification, mix up order of examples across prompts to avoid sequential bias
  • Many-shot: can be used as task-based pre-training with potentially hundreds of examples is becoming the norm as context windows grow

Structuring prompts:

  • System Prompting
    • Background instructions/rules/information
    • Influence behavior, tone, style, and solver approach
    • LLM iterative prompt refinement via optimizers
  • Role Prompting
    • Assigns a persona or identity
    • Instructions on tone, focused expertise, style
  • Using Delimiters: to clearly distinguish and visually/programmatically separate instructions, context, and examples via triple backticks/hyphens or XML tags. Reduces misinterpretation by the model to ensure clarity and distinction of each part of the prompt

Contextual Engineering:

  • Dynamic context from previous dialogs, relevant documents, or specific operation parameters leading to grounded response
  • Essential for long-term accuracy, high capability, and situationally aware systems where context is king over model architecture
  • Layers:
    1. System prompts: foundational instructions
    2. External data: Retrieved documents and tool outputs
    3. Implicit data: user, history, state
  • Engineering is needed for runtime ETL of data with feedback loops

Structured Output: Specify output format JSON, MD, XML required for reliable agentic systems to allow for pipelining and interoperability.

Reasoning and Thought Process Techniques:

  1. Chain-of-Thought (CoT)
    • Mimic human step-wise thinking with low effort for increased interpretability, robustness across model upgrades, and good for debugging at the expense of more output tokens
    • Zero-shot CoT: append "let's think of it step by step"
    • Few-shot CoT: with a few examples
    • Best practices: final answer after steps, 0 temperature if you expect a single response
  2. Self-Consistency
    • Leverage temperature to generate diverse reasoning paths and answers with the same prompt and choose the most common answer
    • Avoid single-attempt errors for problem with multiple valid reasoning paths similar to wisdom of the group but very expensive
  3. Step-Back Prompting
    • Emphasis on general principles of the problem by prompting for it first so it's in the context then ask with user context
    • Activates the relevant background and mitigates superficial elements of the question and user context alone
  4. Tree of Thoughts (ToT)
    • Extends CoT to parallel-ly explore multiple reasoning paths using nodes of "thoughts" branching out multiple reasoning routes allowing for backtracking or evaluating multiple paths
    • Good for exploratory problem solving allowing consideration of diverse perspectives, avoiding initial errors by investigating alternative branches within the thought tree

Action and Interaction Techniques:

  • Tool Use/Function Calling
    • Perform actions beyond its capabilities: search, send email, calculations, APIs
    • Agentic systems execute the tool on behalf of the model which specifies the expected parameters described by the tool and returns the result to the model context
  • ReAct (Reason and Act)
    • Automated CoT loop with interactive tool calling to answer questions
    • if not final_answer:
      • Thought: explain current understanding and plan
      • Action: decide on action specifying the tool and input
      • Observation: system executes tool and provides result as context
  • Automatic Prompt Engineering (APE)
    • Models write (several), evaluate, and refine prompts
    • Metrics: BLEU | ROUGE | Human
    • DSPy prompt optimization framework requires:
      • A golden set (I/O pairs as ground truth)
      • Scoring Metric on quality, accuracy, and correctness of response
      • Few-shot: model samples few examples that guide the model towards generating the desired output
      • Instructional Prompt Optimization: LLM used to mutate core prompt instructions, tone, structure to get the best scores
  • Iterative Prompting/Refinement
    • Human-driven prompt refinement
  • Negative Examples
    • Usually instructions over constraints
    • Used sparingly; it's useful to clarify boundaries or prevent specific responses
  • Analogies
    • Use for creative or complex task mapping prompts segments to I/O "raw ingredients" = "data points"
  • Factored Cognition/Decomposition
    • Break down overall goal into a series of prompts that are more manageable sub-tasks like overview of paper and working on section at a time
  • RAG
    • Grounding prompts with external relevant knowledge as context to avoid hallucination and data access to proprietary information
  • Personal Pattern
    • Different from role prompting, this qualifies the target audience for the model's output

Prompting Best Practices:

  • Provide Examples: few-shot some
  • Simple: concise and clear prompts
  • Specify output: format, length, style
  • Instructions over constraints: wants over don't wants
  • Control max tokens: via config or prompt
  • Use variables in programmatic prompts
  • Experiment with input formats, styles, phrasing, tone
  • Mix categories with Few-shot examples to avoid overfitting
  • Adapt to model updates
  • Collaborate on prompts
  • W/CoT answer after reasoning, zero temperature for single correct answer
  • Document failing prompts and why
  • Save in codebases
  • Automated tests and evals for prod systems to monitor performance

23. AI Agentic Interactions

24. Agentic Frameworks

  • LangChain: Use LangChain Expression Language (LCEL) for Directed Acyclic Graphs linked by pipes. Good for linear workflows
  • LangGraph: Build workflows for where nodes are LCEL and edges connecting nodes is the conditional logic allowing for cycles. State is managed by the framework. Used for multi-agent, plan-and-execute, human-in-the-loop flows. Good for cyclic, branching workflows that require tool usage
  • Google ADK: Opinionated and abstracts away low-level graph constructions. Provides predefined patterns for multi-agent interaction. Uses a concept of a team of agents to delegate tasks to a fleet of sub-agents managing state and sessions implicitly with less granularity than LangGraph using a factory pattern
  • Crew.AI: Crew+Agents+Tasks executed in The Process. Developer designs a team charter where the framework concentrates on the logic of agent collaboration and for simulating a team of specialists
  • Microsoft Autogen: orchestrations of agents via conversation
  • llamaIndex: data framework for ETL with limited capabilities of orchestration
  • Haystack: good for search w/LLMs. Nodes for retrieval, question answering, and summarization with emphasis on performance and scalability for large-scale information retrieval optimized for static pipelines
  • MetaGPT: SOP (standard operations procedures) agents that have specific roles to generate highly structured and coherent outputs needed for specialized domains (code gen). Less capable for general tasks
  • SuperAGI: OSS framework for complete autonomous agent lifecycle management. Provides observability and implements common fallback management for production-readiness. All batteries included and complex
  • Semantic Kernel: MS SDK integrates with LLM as a reasoning engine via plugins and planners framework
  • Strands Agents: Model-driven approach to run agents built for simplicity and scalability for conversational to complex multi-agents with native MCP support to external tools. Barebones with add-ons needed for lifecycle, monitoring

25. Building an Agent with AgentSpace

26. AI Agents on the CLI

27. Under the Hood: An Inside Look at the Agents' Reasoning Engines

Gemini:

  1. Deconstruct the prompt: Core task, Key Concepts, Constraints and Instructions, Intent
  2. Information Retrieval and Synthesis: activate relevant knowledge (form key words), pattern recognition using the transforms between words and ideas, structure the answer
  3. Generate the response: language constructions, apply given rules, review and refine

ChatGPT:

  1. Read and understand input: break sentence into parts (subject, actions, conditions), the ask and type of problem
  2. Activate relevant knowledge
  3. Choose a reasoning model: deduction, induction, causal, analogical
  4. Simulate thinking using CoT
  5. Formulate an Answer
  6. Adjust for clarity and tone

Grok:

  1. Understand the query: extract intent, context, and specifics by tokenizing and getting semantic meaning to know what you are asking for
  2. Contextual Analysis and Knowledge Retrieval: leverage attention mechanisms prioritize relevant concepts
  3. Formulate a mental model: align the question with the sequence of steps understanding, retrieval, evaluation, and synthesis prioritize clarity and coherence
  4. Evaluate options and hypotheses: consider multiple responses and use probabilistic reasoning based on preference and appropriateness to pick the best one, iteratively
  5. Construct the Response: prioritize coherence and structure aligned with instructions into digestible steps
  6. Review and refinement: final pass if response meets quality and completion standards

Kimi:

  1. Pre-processing: tokenization and parse syntax to get a semantic map
  2. Problem identification and strategy selection: classify task and choose solution strategy based on understanding map
  3. Knowledge retrieval
  4. Execution and verification
  5. Response Generation: plan structure
  6. Metacognitive reflection: score and check for edge cases
  7. Create response

Claude:

  1. Pattern Recognition and context building: Identify problem and build context considering your level and required level of detail
  2. Information retrieval and Assembly: pull relevant knowledge from memory
  3. Structural Planning: consider response form
  4. Iterative Refinement: iteratively evaluate response on relevance and coherence backtracking to rephrase if it doesn't make sense
  5. Constraint Checking: check if responses are accurate, helpful, safe, clear
  6. Analogical thinking: compare new situations to similar past patterns or examples to enhance the understanding of the concept

DeepSeek:

  1. Receive and parse import: keywords, grammar, intent
  2. Contextualize and Retrieve relevant information
  3. Identify core concepts and structure
  4. Build and sequence CoT: core plan and think out loud
  5. Calculate Probabilities and Generate Output: based on next word using context and training data
  6. Iterate and Refine: on each word
  7. Apply internal training frameworks: gleaned from RLHF and filtering mechanisms are used while generating output

28. Coding Agents

Solves the blank page problem to get a POC. Agents as team members:

  1. Human-led orchestration: dev is team lead and architecture who makes final decisions. It directs the agent and provides context
  2. The primacy of context: quality output depends on completeness of context with: the codebase; external knowledge; a human brief
  3. Direct model access: directly use frontier models

Different agents: implementer; testers; documenter; optimizer; supervisor

Conclusion

Key Principles:

  1. Core Execution (workflows) and task decomposition
  2. Interaction (tools) with external environments
  3. State (memory), learning (reflection), and self-improvement
  4. (Multi-agent) Collaboration and communication

Patterns for complex systems:

  1. Initial Planning: research
  2. Information gathering with tools
  3. Collaboration analysis and writing
  4. Iterative reflection and refinement
  5. State management

Future:

  • Autonomy with reasoning
  • Agentic ecosystems and standardization
  • Aligned on safety and robustness

Happy Hackin'!


References:

This article is part of the system design series where I am summarizing chapters from The System Design Interview: Volume 1 / Volume 2 amongst other related content