Agent Fundamentals¶
An AI agent is an autonomous system that uses an LLM as its reasoning engine to perceive its environment, reason about situations, take actions via tools, and adapt based on results. Unlike a chatbot (single-turn responses), an agent connects to databases, APIs, and tools to autonomously complete multi-step tasks.
Key Facts¶
- Agent = LLM brain + tools + memory + planning
- More capable models produce more capable agents (GPT-4 >> GPT-3.5 for complex agents)
- Function calling capability is essential - the model must reliably output structured tool calls
- Each reasoning step costs tokens - a single request may require 5-20 LLM calls
- Use workflows (fixed step sequences) when the process is known; use agents only when dynamic decision-making is needed
Agent Components¶
1. LLM Brain (Reasoning Engine)¶
Core decision-making. Processes context, reasons about next steps, generates tool calls.
2. Tools¶
External capabilities: search, code execution, file operations, APIs, communication. Any function with a description for the LLM.
3. Memory¶
- Short-term: current conversation context
- Long-term: persistent knowledge across sessions (vector stores, databases)
- Working memory (scratchpad): accumulated thoughts, actions, observations during execution
4. Planning¶
- No planning: direct tool call from user request
- Sequential: step-by-step execution plan
- Hierarchical: subtasks handled by sub-agents
- Iterative refinement: plan -> execute -> evaluate -> revise
The ReAct Loop¶
The foundational agent execution pattern (Reasoning + Acting):
1. THOUGHT: Analyze situation, decide next action
2. ACTION: Call a tool with specific inputs
3. OBSERVATION: Receive tool output
4. Repeat until task complete
5. FINAL ANSWER: Synthesize and respond
Example:
User: What's the weather in Paris and should I bring an umbrella?
Thought: I need to check weather in Paris
Action: weather_api(city="Paris")
Observation: Temperature: 15C, Rain probability: 80%
Thought: High rain probability means umbrella needed
Final Answer: Paris is 15C with 80% chance of rain. Bring an umbrella.
Agent Types¶
| Type | Description | Use Case |
|---|---|---|
| Tool-Use | LLM decides which tool to call. No complex planning. | Simple API integrations |
| Conversational | Maintains dialogue, asks clarifying questions | Customer support |
| Plan-and-Execute | Creates full plan first, then executes step by step | Complex multi-step tasks |
| Self-Correcting (Reflexion) | Evaluates own output, critiques, retries | Code generation, analysis |
Agent vs Workflow¶
| Factor | Agent (autonomous) | Workflow (predefined) |
|---|---|---|
| Flexibility | High - adapts to novel situations | Low - follows fixed steps |
| Predictability | Low - may take unexpected actions | High - deterministic path |
| Debugging | Hard - trace through reasoning | Easy - check each step |
| Cost | Higher - more LLM calls | Lower - minimal LLM calls |
| Best for | Open-ended research, dynamic tasks | Known processes, pipelines |
Agent Architectures¶
Single Agent¶
One LLM handles everything. Simple but limited for complex tasks.
Router Pattern¶
LLM classifier routes to specialized agents:
Supervisor Pattern¶
Boss agent delegates to specialized workers:
User Request -> Supervisor
-> Worker 1 (Research)
-> Worker 2 (Analysis)
-> Worker 3 (Writing)
-> Supervisor synthesizes
Error Handling¶
Agents can fail at multiple points: - Malformed tool calls from LLM - Tool execution failures (API error, timeout) - Infinite loops - Misunderstanding task and taking wrong action
Mitigation: max iteration limits (10-20 typical), output validation, fallback to human, structured error recovery.
Agent Benchmarks¶
| Benchmark | What It Tests |
|---|---|
| SWE-bench | Real GitHub issues (code understanding + fixing) |
| WebArena | Web browsing agent evaluation |
| GAIA | General AI assistants |
| ToolBench | Tool-use across diverse APIs |
Gotchas¶
- Start with workflows, add agency gradually - don't make everything autonomous
- Local/small models produce errors with complex agent workflows - use capable models
- Agent cost estimate: $0.01-$1.00 per request depending on complexity
- The scratchpad (accumulated history) grows with each step - must be managed (truncation, summarization)
- Logging everything (thoughts, actions, observations) is essential for debugging agents
See Also¶
- [[agent-design-patterns]] - ReAct, plan-and-execute, reflexion patterns in detail
- [[function-calling]] - How agents invoke tools
- [[multi-agent-systems]] - Multi-agent architectures
- [[agent-memory]] - Memory management for agents
- [[agent-security]] - Securing agent systems