Agentic Design Patterns

I've been working through Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems and here are my cliff notes on the core patterns, frameworks, and techniques. A good book that covers the current state of agentic usage. This isn't exhaustive, it's a practical reference of what actually matters when you're building agent systems. Whether you're just starting out or scaling multi-agent systems, these patterns are the building blocks.

1. Prompt Chaining (Pipeline Pattern)

Reduces drift and hallucinations over single prompt
Break down complex query into specialized, specific parts
Subsequent query depends on previous query output
Output has to be well-structured (crucial)
Each query:
- Runs as a unit of work
- Can run sub-queries in parallel
- Can enrich with external sources
- Can have conditional queries
- Can refine queries and ask follow-up questions
Use cases: Information Processing workflows, Complex Query Answering, ETL, Content Generation, Conversational agents w/State, Code generation and refinement, Multimode and Multi-step Reasoning

Context/Prompt Engineering is designing, constructing, and delivering a complete informational environment to the agent before token generation. It differs from traditional prompt engineering which tries to craft the perfect prompt. The context is the main component to ensure the agent knows the user intent, history, and current environment.

Use for complex tasks that can be broken down into distinct, linear steps with processing stats that can interact with tools and refine results for the next step.

2. Routing

Introduces conditional logic to select from a set of subsequent actions based on environment, user input, or proceeding results
Routing happens via:
1. LLM-Based (prompt)
2. RAG/ML-Classifier based (semantic)
3. Rule Based (extracted data and deterministic)
Routing provides the capacity for logical arbitration moving from static executor of predefined sequences to a dynamic system that can make decisions about the most effective way to complete a task under changing conditions
Use when an agent must decide between multiple distinct workflows, tools, or subagents based on classifying a variety of user inputs

3. Parallelization

Allows running independent tasks in parallel or concurrently more efficiently:

Information Gathering & research (e.g., Researching a company)
Data processing & Analysis (e.g., analyze customer feedback)
Multi-API or tool interaction (e.g., A travel planning agent)
Content Generation w/Multiple components (e.g., creating a marketing email)
Validation & verification (e.g., independent checks or validations concurrently)
Multi-Modal Processing (e.g., analyzing social media post w/text & images)
A/B Testing or Multiple Options Generation (e.g., generating different creative text options)

4. Reflection

An agent evaluates its own work, output, and internal state to improve its performance or refine its response. It adjusts its approach based on feedback, internal critique, or comparison against desired criteria. This can be wrapped in a separate agent to look at the output of an agent.

It allows for self-correction
The quality, accuracy, and detail of the output is more important than speed/cost
It follows the Generator-Critic or Producer-Reviewer model using separate roles for different concerns to robustly prevent cognitive bias
A feedback loop is introduced:
1. Execution: generate an initial output
2. Evaluation/Critique: Rule/LLM analysis for factual coherence, style, completeness, adherence to instructions, or other relevant criteria
3. Reflection/Refinement: Based on the previous step, refine the output; adjust parameters of the next steps; modify the overall plan
4. Iteration: Optional re-execution of the refining of the output or adjusted approach repeating until a satisfactory result/stopping condition

Use Cases:

Creating writing & content generation (e.g., write a blog post)
Code generation and debugging
Complex problem solving (e.g., solving a logic puzzle)
Summarization and information synthesis (e.g., summarize a long document)
Planning & Strategy (e.g., plan a series of actions to achieve a goal)
Conversational Agents (e.g., customer support chatbot)

5. Tool Use (Function Calling)

Closes the gap of an LLM's reasoning capabilities with external functionalities by providing access to up-to-date information, ability to calculate or trigger real-world actions
Agents work as orchestrators across a diverse ecosystem of digital resources and other intelligent entities
The process involves:
1. Tool definition: description, capabilities, typed parameters
2. LLM decision: based on the user's request and available tools
3. Function call generation: JSON output of tool + arguments to make the call
4. Tool execution: the agent orchestration layer intercepts the structured output, identifies the tool, and executes the function with the provided args
5. Observation/result: returns to the agent
6. LLM Processing (optional common): uses result output as context to finalize the response or deciding on the next step (reflection)

6. Planning Pattern

Use only with loose constraints to allow for dynamic solutions to complex problems that can't be handled by a single action or tool
Don't use for solved, well-understood problems that are repeatable and procedural
It provides a logical framework beyond reactive actions to goal-oriented behavior using a coherent sequence of interdependent operations
Agents use a structured approach and create a coherent plan, decomposing a high-level objective into a sequence of smaller actionable steps handled in a logical order

7. Multi-Agent Collaboration

Predicated on the principle of task decomposition that's handled by a cooperative ensemble of specialized agents
The critical component relies on the mechanisms of inter-agent communication
Multi-agent collaborations allow agents to achieve goals aligned with the overall objective through:
1. Sequential handoffs
2. Parallel processing
3. Debate and consensus
4. Hierarchical Structures
5. Expert teams
6. Critic-Reviews

Use cases: when a complex problem can be decomposed into manageable subproblems that require specialized solvers

Complex research and analysis (e.g., sequential agents working on a research project: searcher; summarizer; trend identifier; synthesizer)
Software Development (e.g., collaborative agent gathering requirements, passing it to a code generator and tester critic-reviewer group that outputs to a document generator)
Creative content generation (e.g., a collaborative group of a market research agent, a copywriter, a graphic designer, and a social media scheduling agent to create a marketing campaign)
Financial Analysis (e.g., a collaborative group of a stock fetcher, news sentiment analyzer, technical analyzer to generate investment recommendations)
Customer support escalation (e.g., frontline ingestion agent that delegates to an appropriate specialist leveraging sequential handoff to solve issues)
Supply Chain Optimization (e.g., agents represent nodes of the chain—suppliers, manufacturers, distributors—collaborating to improve output efficiency)
Network analysis and remediation (e.g., aid with failure attribution by collaborating to triage and remediate issues suggesting optimal actions)

Agents relate and communicate in a variety of ways:

Solo agent
Network agents: decentralized interaction fosters resilience with challenges of decision-making coherence and communication overhead
Supervisor: coordinator that allocates tasks to resolve conflicts of a group of subordinates. Introduces a single point of failure that could get overwhelmed
Supervisor as a tool: less about control, more of a facilitator
Hierarchical: multiple levels of supervisors over a collection of agents at the lowest tier allowing for distributed decision making within well-defined boundaries
Custom: an amalgamation of hybrid communication approaches or novel ones to solve specific use cases

8. Memory Management

An agent's ability to retain and utilize past interactions, observations, and learning experiences so they are stateful. Allows maintenance of conversation context and improvement over time.

Short-term memory is the context window and is ephemeral
Long-term memory is persistent memory repository like knowledge graphs, vector stores, or databases that's queryable

Use cases:

Chatbot and conversation AI: short-term is per chat, long-term allows recall of preferences, prior discussions to allow for personalized, relevant, and continuous interactions
Task-Oriented Agents: short-term memory to track previous steps, progress, and goals. Long-term is used to recall user-related data
Personalized Experiences: long-term to retrieve user's preferences, past behavior, and personal information
Learning and improvement: use reinforcement learning to store learned strategies or knowledge gleaned from successes and mistakes in long-term memory
RAG: access long-term vector stores of documents or data to inform responses
Autonomous systems (e.g., self-driving car uses short-term memory for immediate surroundings and long-term memory for general environmental knowledge)

ADKs use a Session for an individual chat thread. Session.state is data relevant to current active chat thread. Memory is a data repository sourced from various past chats or external sources. Session/Memory services manage data storage, retrieval, and lifecycle. Developers interact through these.

Long-term memory can be:

Semantic: remember facts that ground responses leading to more personal and relevant interactions
Episode: remember experiences using past events or actions recording successes, implemented as a few-shot prompt of exemplars for accomplishing a task
Procedural: remembering rules of how to perform tasks, part of system prompt allowing for change via reflection using recent interactions and instructed to refine its existing instructions

9. Learning and Adaptation

Allows for independent agents that optimize performance without constant manual intervention by changing their thinking, actions, or knowledge based on new experience or data. Adaptive agents exhibit enhanced performance in variable environments through iterative updates driven by experiential data.

Learning can be done by: reinforcement; supervised; unsupervised; few/zero-shot with LLM-based agents; online; memory-based.

Update strategies of LLM models are done via:

Proximity Policy Optimization (PPO) is a training strategy to reliably and stably improve an agent's decision-making strategy in reinforcement learning. It avoids drastic changes by making careful updates which is measured against a surrogate goal of expected value. It uses a clipping mechanism using a safe zone that isn't too different from the current strategy which won't necessarily maximize the expected value but will minimize the risk of undoing its learnings. The process works by:
- Train a separate reward model based on feedback on the observed model which is used to predict the score of future feedback responses
- The observed model is fine-tuned with PPO to maximize possible scores from the judgment reward model in the training game
Direct Preference Optimization skips the reward model and mathematically and directly links preference data to an optimal policy of getting preferred responses for a more robust and efficient alignment process. It avoids complexity and potential instability of reward model loop

Use Cases:

Personal agents: that refine interactions by analyzing identified behaviors historically
Trading bots: high-resolution view of real-time data that's analyzed to adjust parameters of the decision model to maximize profit and mitigate risk
Application agents: optimize UX and functionality to increase engagement and system intuitiveness on observed behavior
Robotic and autonomous vehicle agents: improve navigation and response across diverse environments by factoring integrated sensor data and historic action analysis
Fraud detection agents: improve anomaly detection by refining predictive models improving security and minimize financial loss by detecting newly identified fraudulent patterns
Recommendation agents: improve selection precision of individual but context-relevant content via preference-learning agents
Game AI agents: dynamically adapt strategies to enhance player engagement by increasing gameplay challenge
Knowledge-based learning agents: reference RAG solution sets of problems during decision making to adapt to new but similar situations using historical successes

10. Model Context Protocol

A protocol that facilitates consistent and predictable integration of LLMs with external sources. Comprised of resources (data), interactive templates (prompts), and tools (actionable functions) which are used by MCP clients.

Clients are the host application or even agents which use an MCP contract that's essentially an agentic interface
Effective connectors should wrap deterministic APIs completely to service non-deterministic agent work. It's better if the content is in an understandable textual form or can be parsed into it for better compatibility
Tool calling is a direct 1-to-1 request to a specific function and is usually proprietary by LLM provider
MCP is an interface for LLM to JIT discover, communicate with, and use external capabilities to foster an interactive ecosystem of LLMs and tools
Wrapping existing assets with this protocol increases information interoperability where the federated model is composed of legacy content orchestrated by LLMs
Local communication uses JSON RPC over STDIO for efficient, and remote interactions leverage web-friendly SSE or streaming HTTP for persistent but efficient client-server communication
The client-server is the information flow. MCP clients wrap around an LLM that's an intermediary translating formal requests to that MCP standard where it's responsible for discovery and connection to MCP servers
MCP domain-specific servers expose themselves to authorized clients
Optional third-party services are connections managed by the server to access content to service other client requests

Interaction flow:

Discovery - MCP client asks server what capabilities: response - manifest: tools; resources; prompts
Request formulation: pick a tool with right parameters
Client communication: LLM-formulated request sent as standardized call to the appropriate server
Service execution: authenticates client, validates request, executes action by function call
Server response: send standardized response of status + relevant output
Client context update: pass result back to LLM that updates its context and proceeds to the next task

Use cases: DB integrations; Gen Media Orchestration (complex workflow orchestration); External API Interaction; Reasoning-based information extraction (relevant excerpts of whole documents based on query); Custom tool development w/standardized LLM-to-App communication (FastMCP bolt-on); IoT device control; Financial services automation

11. Goal Setting and Monitoring

Set (SMART) goals with tracking capabilities of the agent. Generate (sub/autonomously) intermediate steps (sequential/complex) with feedback loops for an effective plan based on training data and understanding of tasks.

Use Cases:

Customer support automation. Goal: resolve billing inquiry. Success: confirm billing change and positive customer feedback. Methods: monitor conversation, database access, billing tools
Personalized learning systems: Goal: improve student understanding of subject. Success: positive trend of metrics; Methods: material progress tracking, tools to alter teaching materials, performance tracking
Project management systems. Goal: project completion date; Methods: monitors task status, team communication, resource availability, flagging delays and suggesting corrective actions
Automated trading bots: Goal: Maximize portfolio gains within risk tolerance; Methods: monitor market data, current portfolio value and risk indicators, trade execution
Robotics/Autonomous Vehicles: Goal: transport passengers safely from A to B. Methods: environment monitoring, self-state (speed, location), progress monitoring, driving controls to adapt to goal
Content Moderation. Goal: identify and remove harmful content from platform. Method: monitor incoming content, apply classification, track metrics (false positives/negatives), adjust filtering criteria, notify human

12. Exception Handling and Recovery

Proactive preparation and reactive strategies to detect failure, adapt, or ensure controlled failure to maintain uninterrupted functionality in unpredictable settings to boost stability, effectiveness, trustworthiness, and confidence. Integrating with monitoring and diagnostic tools to identify problems, also adding reflection to analyze failures to refine re-attempts to try and self-resolve errors.

Exception Handling and Recovery pattern: detect, handle, recover.

Customer service chatbots: customer database is down, detect the API failure, instruct user to try again later or escalate to a human
Financial trading bot: failure to execute because of lack of funds or market close, don't retry, log and notify the user
Home automation: device offline or broken, retry, if unsuccessful log and notify user for manual intervention
Data processing agents: corrupted batch file, skip it, continue with rest, log failure and report to the user skip one partially succeeding
Web scraping agents: encounters CAPTCHA or 503. Handle with grace, pause/retry or report failing URL
Robotics: failure to pick up, detect failure, adjust and retry. If not successful alert human

13. Human in the Loop

Deliberate interweaving of superior human judgment with efficiency of AI for critical tasks which have significant safety, ethical, or financial consequences or have a certain level of ambiguity. Use also for training AI with labeled data or refining generative outputs.

AI augments human capabilities for collaborative decisions leveraging individual strengths
Human Roles: overseer, corrector, feedback (for training), decider (using AI data summaries), collaborator, escalation responder
Caveats: Humans reduce scalability; effectiveness depends on human expertise; Anonymizing data for operator overview

Use cases (ambiguities are escalated): content moderation; autonomous driving; fraud detection; legal document review; customer support; data labeling; generative AI refinement; autonomous networks

Human-On-the-Loop variation specifies broad parameters that constrain AI to execute high volume of task below thresholds which trigger human intervention like financial or call center bots.

14. Knowledge Retrieval (RAG)

Closes the knowledge gap between trained and current/external/context-specific queryable data allowing them to ground their actions for more accurate and up-to-date responses. The intent of a query is extracted and relevant chunks of documents are collated based on that which are added to the context and sent to the LLM.

The use of relevant documents provides verifiability via citations and reduces the risk of hallucinations enhancing trustworthiness.

RAG Concepts:

Embeddings (text → tokens → numerical vectors)
Text Similarity (distance between two vectors)
Semantic Similarity and Distance (similarity = 1/vector distance)
Chunking (split documents into blocks) allow for faster/relevant retrieval
Retrieval (BM25 keyword on contextual documents retrieved)
Vector Databases (semantic matching vectorized query with content using efficient Hierarchal Navigable Small World closest search)

RAG's Challenges:

Total content is spread across chunks
Inaccurate retrieval and introduce noise causing hallucinations
Preprocessing of content and ready for RAG and keeping it up to date is cumbersome

Graph RAG uses knowledge graphs that link data nodes related by labeled edges to allow coalescing a contextual answer from several data fragments; The cost and quality depend on maintaining a high-quality graph which is slower than traditional RAG.

Agentic RAG introduces a reasoning/refiner agent that reconciles retrieved information to ensure a more accurate and trustworthy final response. It could filter/augment content based on relevancy, accuracy, authoritative sources, reasoning, and close knowledge gaps at the expense of cost, latency, and complexity.

Use cases: Enterprise search / Q&A; helpdesk; recommenders; summarizers

15. Inter-Agent Communication (A2A)

Framework-agnostic and diverse agent collaboration via standard protocols.

Concepts:

Core Actors: Client (requestor); Server (provides a service)
Capabilities
Discovery: (Defined) well-known URLs (standard); curated registry; direct configuration (embedded/private)
Communications: stateful, async tasks via messages that generate resultant artifacts in parts for retrieval and can be grouped by contextId over HTTPS w/JSON-RPC
Interaction Mechanisms: Polling (via setTask), SSE (streaming via sendTaskSubscribe), Sync request, Webhooks (callbacks)
Security: TLS, audit logs, tokens, API keys

A2A vs MCP:

Enhance coordination/communication between agents vs tools
Foster innovation and interoperability in complex multi-agent systems

Use cases: multi-framework collaboration; automated workflow orchestration; dynamic information retrieval

16. Resource-Aware Optimization

Action sequencing within a budget leveraging cost vs. speed/accuracy with a fallback strategy for graceful degradation.

Use cases: cost-optimized LLM usage; latency-sensitive operations; energy efficiency; fallback for service reliability; data usage management; adaptive task allocation

Agent resource optimization:

Dynamic model switching
Adaptive tool use and selection
Contextual pruning and summarization
Proactive resource prediction
Cost-sensitive exploration
Energy-efficient deployment
Parallelization and distributed computing awareness
Learned resource allocation policies
Graceful degradation and fallback

17. Reasoning Techniques

Explicit internal reasoning beyond sequential operations via allocation of increased computing resources during inference with a bigger time budget for iterative refinement and exploration to enhance accuracy, coherence, and robustness for complex problems that require deeper analysis and deliberation.

Use Cases:

Complex question answering: multi-hop queries, multi-data source integration, logical deductions from multiple reasoning paths
Mathematical problem solving: decompose to solvable issues in step-by-step fashion and using extra time to generate intricate code for precise results and validate it
Code debugging and generation: through debug cycles analyzing agent's analysis and rationale for code generation sequentially finding issues and iteratively refining code based on results
Strategic planning: develop comprehensive plan from various signals and adjusting plans based on real-time feedback (ReAct)
Medical diagnosis: assessing symptoms, test results with patient's history to reach a thorough, differential diagnosis potentially utilizing external data retrieval tools
Legal Advice: analysis of legal documents and precedents to formula arguments or provide guidance ensuring logical consistency through self-correction

Reasoning Techniques:

Chain-of-Thought (CoT): instruct the agent to decompose the problem with optional few-shot examples guiding internal processing to a more deliberate logical progression
Tree-of-Thought (ToT): non-linear CoT to allow for branching, evaluating, self-correction, and backtracking to explore multiple solutions before finalizing an answer
Self-correction: a.k.a. self-refinement where agent evaluates intermediate thought processes and generate content for gaps, ambiguities, inconsistencies, or inaccuracies in its understanding of the solution. The review/refine cycle allows adjustment to a more accurate, thorough, high-quality, and reliable response
Program-Aided Language Models (PALMs): allows for deterministic code creation and execution
Reinforcement Learning with Verifiable Rewards (RLVR): CoT with thinking to generate reasoning trajectory learned from labeled examples without supervision
Reasoning and Acting (ReAct): CoT with tools planning incorporating available tools and anticipate an outcome working in interleaved/iterative manner
Chain of Debates (CoD): AI agent council meeting and peer review to leverage collective intelligence
Graph of Debates (GoD): non-linear consensus of most well-supported cluster of arguments
MASS (Multi-Agent System Search): is an optimization of Multi-Agent Systems to optimize: prompt; workflow paths; and overall system prompt tuning of optimized paths
Deep research: with time budgets to create reports with: Initial Exploration; Reasoning and refinement; Follow-ups; Final synthesis

The inference scaling law posits that a smaller model with a bigger thinking budget during inference can occasionally surpass the performance of larger models using a smaller thinking budget constraining the computationally intensive generation process. It balances model size, latency, and operational costs. Bigger isn't always better.

18. Guardrails/Safety Patterns

Safety patterns designed to keep responses useful and ethical without mainly being restrictive to maintain user trust. Can be used for inputs and outputs.

Use cases:

CS chatbots - guard against offensive language or off-topic responses and instruct on toxic user responses
Content Generation Systems - adhere to prescribed ethical guidelines and legal standards to even post redact content
Educational assistants - prevent wrong answers or inappropriate/non-curriculum responses
Legal RA - guide to consultation over providing substitutive, definitive legal advice
Recruitment - ensure fairness and filter discriminatory language or criteria
Social Media Content Moderation - flag hate speech, misinformation, or graphic content
Scientific RA - guard against fabricated data or unsupported conclusions with emphasis on empirical validation and peer review

19. Evaluation and Monitoring

Measures agent's: effectiveness; efficiency; compliance w/reqs via metrics, feedback loops, and reporting systems.

Use Cases:

Performance tracing: accuracy, latency, and resource consumption
A/B testing for agent improvements: parallel performance comparison of versioned agents
Compliance and safety: generate audit reports of agents compliance with ethical, regulations, and safety protocols validated by HITL or another agent
Enterprise systems: required generated AI contracts that codify objectives, rules, and controls for AI-delegated tasks
Drift detection: monitor relevance or accuracy of generated content detect degradations due to concept drift (input data) or environmental shifts
Anomaly detection: of unusual or unexpected behavior indicating error, a malicious attack, or emergent undesired behavior
Learning progress assessment: tracking learning curve and improvement in skills
Agent trajectories: qualitatively measures the agent's non-deterministic response by looking at the steps to make a decision. Test (JSON) files of interactions (turns) with expected tool use, intermediate responses, and final response. Eval set files use a dataset (same name) to evaluate longer interactions simulating complex situations or scenarios measuring the same tool use, responses, and final responses
Multi-agent validation: measures cooperations at each stage looking at inputs and outputs as well as a whole
Advanced Contractors: formal relationship between AI/User around:
1. Formalized contract single source of truth of the detailed task detailing deliverables according to specs specifying data sources, scope of work, cost/time constraints being able to objectively verify outcomes
2. Dynamic negotiations to flag ambiguities or risks and resolve misunderstandings and dependencies to increase the probability of an accurate result
3. Quality and correctness focused (not time) iterative executions with self-validation and correction until the spec is satisfied
4. Hierarchical decomposition via subcontracts for delegation by the primary agent

20. Prioritization

Assess tasks based on the criteria definition around subtask significance, urgency, inter-dependency, resources, cost, user preferences to be effective by picking the optimal task and upon completion dynamically prune and reprioritize tasks.

Use Cases:

Automated customer support: requests based on high-priority users or major outages
Cloud computing: resources allocation to critical applications
Autonomous driving: safety over efficiency
Financial trading: trades based on set preferences
Project Management: tasks based on deadlines, availability, and strategic importance
Cybersecurity: alerts based on threat severity, impact, and asset criticality
Personal Assistant: events based on user importance, deadlines, and current context

21. Exploration and Discovery

Venture into unfamiliar spaces, experimenting with new approaches and generating new knowledge and understanding. Crucial for open-ended, complex, or rapidly evolving domains that render static/preprogrammed knowledge inefficient.

Use Cases:

Scientific research automation: design/run new experiments formulating new hypotheses and novel discoveries
Game play and strategy generation: explore game states, emergent strategies, or identify vulnerabilities in environments
Market research and trend spotting: scan unstructured social media, news to identify trends, consumer behaviors, or opportunities
Security Vulnerabilities: probe code bases for flaws or attack vectors
Creative content generation: explore combinations of styles, themes to generate art, music, or literature
Personalized Education and training: prioritize learning based on individual's dynamic progress, learning style, or areas of improvement
Google Co-Scientist:
- Supervisor Agent with specialized sub-agents following the iterative: generate, debate, and evolve cycle using test-time scaling (higher resources to reason and enhance output)
- Sub-agents:
  - Generator: initiator producing hypothesis via data exploration and simulated debate
  - Reflector: peer reviewer on pillars of correctness, novelty, and quality of hypotheses
  - Ranker: Elo-based tournament to rank hypotheses through simulated debate
  - Evolution: refiner of top hypotheses simplifying concepts, generating ideas, and exploring unconventional reasoning
  - Proximity: clusters similar ideas to assist in exploring the hypothesis landscape
  - Meta-review agent: insights from all reviews/debating finding commonalities and providing feedback for continual improvement

22. Advanced Prompting Techniques

Elicit high-quality outputs from the models by understanding capabilities and limitations of them.

Core Prompting Principles:

Clarity and specificity: unambiguous and precise instructions. Define tasks, output, limitations, and requirements w/o vague assumptions
Conciseness: direct, simple with active verbs and authoritative instructions without intricate language or superfluous information
Using Action Verbs: for expected operation. Examples: act, analyze, categorize, classify, contrast, compare, create, describe, define, evaluate, extract, find, generate, identify, list, measure, organize, parse, pick, predict, provide, rank, recommend, return, retrieve, rewrite, select, show, sort, summarize, translate, write
Instructions over constraints: positive instructions over negative constraints specifying desired action over what not to do. Constraints for safety or formatting
Experimentation and iteration: iterative prompt refinement looking at results vs. desired output to tweak the prompt. The model variations and configurations like temperature. Documenting attempts and experimentation is vital

Basic Prompting Techniques:

Zero-shot prompting: relying solely on the model's pre-training is the most basic, quickest, with no examples of I/O pairs required. Includes only the task description and initial text. Good for tasks encountered in pre-training like summarization or even translation
One-shot prompting: has a single example of Input-Output pair as a template for instructions for expected results. Good for specific outputs
Few-Shot prompting: Using several (3-5) examples of I/O pairs to improve results. Good for specific format, styles, or nuanced answer variations when one/few don't work. Relies on the quality and diversity of the examples that nuanced and cover edge cases too. With classification, mix up order of examples across prompts to avoid sequential bias
Many-shot: can be used as task-based pre-training with potentially hundreds of examples is becoming the norm as context windows grow

Structuring prompts:

System Prompting
- Background instructions/rules/information
- Influence behavior, tone, style, and solver approach
- LLM iterative prompt refinement via optimizers
Role Prompting
- Assigns a persona or identity
- Instructions on tone, focused expertise, style
Using Delimiters: to clearly distinguish and visually/programmatically separate instructions, context, and examples via triple backticks/hyphens or XML tags. Reduces misinterpretation by the model to ensure clarity and distinction of each part of the prompt

Contextual Engineering:

Dynamic context from previous dialogs, relevant documents, or specific operation parameters leading to grounded response
Essential for long-term accuracy, high capability, and situationally aware systems where context is king over model architecture
Layers:
1. System prompts: foundational instructions
2. External data: Retrieved documents and tool outputs
3. Implicit data: user, history, state
Engineering is needed for runtime ETL of data with feedback loops

Structured Output: Specify output format JSON, MD, XML required for reliable agentic systems to allow for pipelining and interoperability.

Reasoning and Thought Process Techniques:

Chain-of-Thought (CoT)
- Mimic human step-wise thinking with low effort for increased interpretability, robustness across model upgrades, and good for debugging at the expense of more output tokens
- Zero-shot CoT: append "let's think of it step by step"
- Few-shot CoT: with a few examples
- Best practices: final answer after steps, 0 temperature if you expect a single response
Self-Consistency
- Leverage temperature to generate diverse reasoning paths and answers with the same prompt and choose the most common answer
- Avoid single-attempt errors for problem with multiple valid reasoning paths similar to wisdom of the group but very expensive
Step-Back Prompting
- Emphasis on general principles of the problem by prompting for it first so it's in the context then ask with user context
- Activates the relevant background and mitigates superficial elements of the question and user context alone
Tree of Thoughts (ToT)
- Extends CoT to parallel-ly explore multiple reasoning paths using nodes of "thoughts" branching out multiple reasoning routes allowing for backtracking or evaluating multiple paths
- Good for exploratory problem solving allowing consideration of diverse perspectives, avoiding initial errors by investigating alternative branches within the thought tree

Action and Interaction Techniques:

Tool Use/Function Calling
- Perform actions beyond its capabilities: search, send email, calculations, APIs
- Agentic systems execute the tool on behalf of the model which specifies the expected parameters described by the tool and returns the result to the model context
ReAct (Reason and Act)
- Automated CoT loop with interactive tool calling to answer questions
- if not final_answer:
  - Thought: explain current understanding and plan
  - Action: decide on action specifying the tool and input
  - Observation: system executes tool and provides result as context
Automatic Prompt Engineering (APE)
- Models write (several), evaluate, and refine prompts
- Metrics: BLEU | ROUGE | Human
- DSPy prompt optimization framework requires:
  - A golden set (I/O pairs as ground truth)
  - Scoring Metric on quality, accuracy, and correctness of response
  - Few-shot: model samples few examples that guide the model towards generating the desired output
  - Instructional Prompt Optimization: LLM used to mutate core prompt instructions, tone, structure to get the best scores
Iterative Prompting/Refinement
- Human-driven prompt refinement
Negative Examples
- Usually instructions over constraints
- Used sparingly; it's useful to clarify boundaries or prevent specific responses
Analogies
- Use for creative or complex task mapping prompts segments to I/O "raw ingredients" = "data points"
Factored Cognition/Decomposition
- Break down overall goal into a series of prompts that are more manageable sub-tasks like overview of paper and working on section at a time
RAG
- Grounding prompts with external relevant knowledge as context to avoid hallucination and data access to proprietary information
Personal Pattern
- Different from role prompting, this qualifies the target audience for the model's output

Prompting Best Practices:

Provide Examples: few-shot some
Simple: concise and clear prompts
Specify output: format, length, style
Instructions over constraints: wants over don't wants
Control max tokens: via config or prompt
Use variables in programmatic prompts
Experiment with input formats, styles, phrasing, tone
Mix categories with Few-shot examples to avoid overfitting
Adapt to model updates
Collaborate on prompts
W/CoT answer after reasoning, zero temperature for single correct answer
Document failing prompts and why
Save in codebases
Automated tests and evals for prod systems to monitor performance

23. AI Agentic Interactions

24. Agentic Frameworks

LangChain: Use LangChain Expression Language (LCEL) for Directed Acyclic Graphs linked by pipes. Good for linear workflows
LangGraph: Build workflows for where nodes are LCEL and edges connecting nodes is the conditional logic allowing for cycles. State is managed by the framework. Used for multi-agent, plan-and-execute, human-in-the-loop flows. Good for cyclic, branching workflows that require tool usage
Google ADK: Opinionated and abstracts away low-level graph constructions. Provides predefined patterns for multi-agent interaction. Uses a concept of a team of agents to delegate tasks to a fleet of sub-agents managing state and sessions implicitly with less granularity than LangGraph using a factory pattern
Crew.AI: Crew+Agents+Tasks executed in The Process. Developer designs a team charter where the framework concentrates on the logic of agent collaboration and for simulating a team of specialists
Microsoft Autogen: orchestrations of agents via conversation
llamaIndex: data framework for ETL with limited capabilities of orchestration
Haystack: good for search w/LLMs. Nodes for retrieval, question answering, and summarization with emphasis on performance and scalability for large-scale information retrieval optimized for static pipelines
MetaGPT: SOP (standard operations procedures) agents that have specific roles to generate highly structured and coherent outputs needed for specialized domains (code gen). Less capable for general tasks
SuperAGI: OSS framework for complete autonomous agent lifecycle management. Provides observability and implements common fallback management for production-readiness. All batteries included and complex
Semantic Kernel: MS SDK integrates with LLM as a reasoning engine via plugins and planners framework
Strands Agents: Model-driven approach to run agents built for simplicity and scalability for conversational to complex multi-agents with native MCP support to external tools. Barebones with add-ons needed for lifecycle, monitoring

25. Building an Agent with AgentSpace

26. AI Agents on the CLI

27. Under the Hood: An Inside Look at the Agents' Reasoning Engines

Gemini:

Deconstruct the prompt: Core task, Key Concepts, Constraints and Instructions, Intent
Information Retrieval and Synthesis: activate relevant knowledge (form key words), pattern recognition using the transforms between words and ideas, structure the answer
Generate the response: language constructions, apply given rules, review and refine

ChatGPT:

Read and understand input: break sentence into parts (subject, actions, conditions), the ask and type of problem
Activate relevant knowledge
Choose a reasoning model: deduction, induction, causal, analogical
Simulate thinking using CoT
Formulate an Answer
Adjust for clarity and tone

Grok:

Understand the query: extract intent, context, and specifics by tokenizing and getting semantic meaning to know what you are asking for
Contextual Analysis and Knowledge Retrieval: leverage attention mechanisms prioritize relevant concepts
Formulate a mental model: align the question with the sequence of steps understanding, retrieval, evaluation, and synthesis prioritize clarity and coherence
Evaluate options and hypotheses: consider multiple responses and use probabilistic reasoning based on preference and appropriateness to pick the best one, iteratively
Construct the Response: prioritize coherence and structure aligned with instructions into digestible steps
Review and refinement: final pass if response meets quality and completion standards

Kimi:

Pre-processing: tokenization and parse syntax to get a semantic map
Problem identification and strategy selection: classify task and choose solution strategy based on understanding map
Knowledge retrieval
Execution and verification
Response Generation: plan structure
Metacognitive reflection: score and check for edge cases
Create response

Claude:

Pattern Recognition and context building: Identify problem and build context considering your level and required level of detail
Information retrieval and Assembly: pull relevant knowledge from memory
Structural Planning: consider response form
Iterative Refinement: iteratively evaluate response on relevance and coherence backtracking to rephrase if it doesn't make sense
Constraint Checking: check if responses are accurate, helpful, safe, clear
Analogical thinking: compare new situations to similar past patterns or examples to enhance the understanding of the concept

DeepSeek:

Receive and parse import: keywords, grammar, intent
Contextualize and Retrieve relevant information
Identify core concepts and structure
Build and sequence CoT: core plan and think out loud
Calculate Probabilities and Generate Output: based on next word using context and training data
Iterate and Refine: on each word
Apply internal training frameworks: gleaned from RLHF and filtering mechanisms are used while generating output

28. Coding Agents

Solves the blank page problem to get a POC. Agents as team members:

Human-led orchestration: dev is team lead and architecture who makes final decisions. It directs the agent and provides context
The primacy of context: quality output depends on completeness of context with: the codebase; external knowledge; a human brief
Direct model access: directly use frontier models

Different agents: implementer; testers; documenter; optimizer; supervisor

Conclusion

Key Principles:

Core Execution (workflows) and task decomposition
Interaction (tools) with external environments
State (memory), learning (reflection), and self-improvement
(Multi-agent) Collaboration and communication

Patterns for complex systems:

Initial Planning: research
Information gathering with tools
Collaboration analysis and writing
Iterative reflection and refinement
State management

Future:

Autonomy with reasoning
Agentic ecosystems and standardization
Aligned on safety and robustness

Happy Hackin'!

References:

This article is part of the system design series where I am summarizing chapters from The System Design Interview: Volume 1 / Volume 2 amongst other related content

Agentic Design Patterns

1. Prompt Chaining (Pipeline Pattern)

2. Routing

3. Parallelization

4. Reflection

5. Tool Use (Function Calling)

6. Planning Pattern

7. Multi-Agent Collaboration

8. Memory Management

9. Learning and Adaptation

10. Model Context Protocol

11. Goal Setting and Monitoring

12. Exception Handling and Recovery

13. Human in the Loop

14. Knowledge Retrieval (RAG)

15. Inter-Agent Communication (A2A)

16. Resource-Aware Optimization

17. Reasoning Techniques

18. Guardrails/Safety Patterns

19. Evaluation and Monitoring

20. Prioritization

21. Exploration and Discovery

22. Advanced Prompting Techniques

23. AI Agentic Interactions

24. Agentic Frameworks

25. Building an Agent with AgentSpace

26. AI Agents on the CLI

27. Under the Hood: An Inside Look at the Agents' Reasoning Engines

28. Coding Agents

Conclusion

Comments

System Design

Design a distributed unique id generator

More from this blog

8 Simple reminders

Design a url shortner

No regrets

Design a distributed unique id generator

luck surface area

Command Palette

1. Prompt Chaining (Pipeline Pattern)

2. Routing

3. Parallelization

4. Reflection

5. Tool Use (Function Calling)

6. Planning Pattern

7. Multi-Agent Collaboration

8. Memory Management

9. Learning and Adaptation

10. Model Context Protocol

11. Goal Setting and Monitoring

12. Exception Handling and Recovery

13. Human in the Loop

14. Knowledge Retrieval (RAG)

15. Inter-Agent Communication (A2A)

16. Resource-Aware Optimization

17. Reasoning Techniques

18. Guardrails/Safety Patterns

19. Evaluation and Monitoring

20. Prioritization

21. Exploration and Discovery

22. Advanced Prompting Techniques

23. AI Agentic Interactions

24. Agentic Frameworks

25. Building an Agent with AgentSpace

26. AI Agents on the CLI

27. Under the Hood: An Inside Look at the Agents' Reasoning Engines

28. Coding Agents

Conclusion

Comments

System Design

Design a distributed unique id generator

More from this blog