This article presents practical patterns for building effective agentic systems with large language models (LLMs), drawn from production experience across many teams. It distinguishes between workflows (predefined code paths orchestrating LLM calls) and agents (systems where LLMs dynamically direct their own processes). The core building block is the augmented LLM (with retrieval, tools, and memory). Several reusable workflow patterns are described — prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer — along with guidance on when each is appropriate. The article also discusses when to use (and not use) agents and frameworks.
Every agentic system begins with an LLM that can actively use augmentations — retrieval (search queries), tool selection, and memory. Implementations should carefully tailor these capabilities to the use case and expose a well-documented interface. The Model Context Protocol (MCP) is one approach to integrate a growing ecosystem of third-party tools with a simple client. All subsequent workflow patterns assume each LLM call has access to these augmentations.
A task is broken into fixed steps; each step's output becomes the next step's input. Programmatic checks can validate intermediate results (e.g., an outline must meet criteria before generating the full document).
Use when: the task can be cleanly decomposed and higher accuracy justifies the latency of multiple calls.
Examples: generating marketing copy then translating it; writing a document outline, verifying it, then writing the full text.
An LLM (or traditional classifier) categorizes an input and routes it to a specialized handler (different prompt, toolset, or model). This prevents a single prompt from being optimized for one input type at the expense of others.
Use when: there are distinct categories that benefit from separate handling and classification is accurate.
Examples: customer service queries split by type (general, refund, support); using a cheaper model for common questions and a capable model for complex ones.
Two variations:
A central orchestrator LLM decides how to break a task into subtasks, delegates them to worker LLMs, and synthesizes their outputs. Unlike parallelization, subtasks are not pre-defined but dynamically determined per input.
Use when: you cannot predict the subtasks in advance (e.g., coding tasks where the number and nature of file changes depend on the request).
Examples: coding assistants that modify multiple files in one session; complex search that gathers and cross-references information from many sources.
One LLM generates a response; a second LLM evaluates it and provides feedback, enabling iterative improvement in a loop. The evaluator may decide whether further iterations are needed.
Use when: clear evaluation criteria exist and iterative refinement demonstrably improves output (like human editing).
Examples: literary translation where the evaluator catches nuance; complex search tasks needing multiple rounds of querying and analysis.
An agent is an LLM that, after an initial command or discussion with a user, plans and operates autonomously. It iteratively uses tools, observes results (ground truth from the environment), and decides next steps. Human feedback can be requested at checkpoints. Stopping conditions (e.g., max iterations) maintain control.
Implementation is typically simple — an LLM using tools in a loop. Success depends on carefully designing the toolset and tool documentation.
When to use: tasks that require flexibility, multi-step reasoning, and tool use where the exact path cannot be predetermined.
Examples: open-ended research assistants, multi-file coding agents, complex data analysis pipelines.
Start simple: often a single LLM call with retrieval and in-context examples is sufficient. Workflows add predictability for well-defined tasks; agents add flexibility for dynamic tasks. If the task does not benefit from model-driven decision-making, avoid agents. Frameworks can simplify low-level details but risk obscuring the LLM's prompts and responses — always understand the underlying code.
These building blocks aren’t prescriptive. They are common patterns that developers can shape and combine to fit different use cases. Success depends on measuring performance and iterating on implementations. Add complexity only when it demonstrably improves outcomes.
When implementing agents, follow these three principles:
Frameworks can help you start quickly, but as you move to production, reduce abstraction layers and build with basic components. These principles create agents that are powerful, reliable, maintainable, and trusted.
Two particularly promising applications demonstrate the practical value of the patterns above. Both require conversation and action, have clear success criteria, enable feedback loops, and integrate meaningful human oversight.
Customer support combines familiar chatbot interfaces with tool integration. It is a natural fit for open‑ended agents because:
Several companies have validated this approach through usage‑based pricing models that charge only for successful resolutions.
Code agents have evolved from completion to autonomous problem‑solving. They are effective because:
In Anthropic’s own implementation, agents solve real GitHub issues in the SWE‑bench Verified benchmark based solely on the pull request description. Automated testing helps verify functionality, but human review remains crucial for ensuring solutions align with broader system requirements.

Tools enable Claude to interact with external services and APIs. Tool definitions and specifications deserve as much prompt engineering attention as your overall prompts.
Several ways exist to specify the same action (e.g., writing a diff vs. rewriting the entire file; returning code in markdown vs. JSON). Some formats are much harder for an LLM to produce. Suggestions:
Invest as much effort in ACI as in human‑computer interfaces (HCI). Recommendations:
For example, while building the SWE‑bench agent, Anthropic spent more time optimizing tools than the overall prompt. The model made mistakes with relative filepaths after moving out of the root directory. Changing the tool to always require absolute filepaths eliminated the errors.
See also: 存算分离架构, Poke (AI assistant), Poke notification triage, HTML-first workflows with Claude Code, Software Engineering Beyond Coding, Claude Design, Dai Yusen's AI Investment and Ecosystem Analysis, yopedia, RLM Agents, Open Knowledge Format, The Log Is the Agent, Agent Harness, ,
Sources · 1