Your agent worked yesterday. Today it's doing something weird — looping, giving wrong answers, or going completely silent. Here's a systematic way to figure out what broke and fix it fast.
The 4 Categories of Agent Failures
Before you start poking around, it helps to know what kind of failure you're dealing with:
- Input failures — The agent got bad data (wrong tool output, garbled context, missing variables)
- Reasoning failures — The agent's prompt or instructions led it to the wrong conclusion
- Tool failures — An external tool (API, function, search) returned an error or unexpected result
- Memory failures — The agent lost context it needed, or hallucinated past context it didn't have
Most failures fall into one of these. Your debugging approach changes depending on which one it is.
Step 1: Read the Actual Output (Not Just "It Failed")
This sounds obvious, but most people look at the effect (wrong action, no response) without reading the agent's actual output carefully.
Check:
- What did the agent say it was going to do?
- What did it actually do?
- If it used tools, what did the tool return?
If you're using a framework like n8n, Make, LangChain, or OpenClaw — find the raw execution log. The error is almost always there.
Step 2: Reproduce with Minimal Input
Strip the scenario down to the simplest version that still fails:
- Remove extra context
- Use a simple, known-good test prompt
- Run it with a single tool call instead of a full workflow
If the simple version works, the problem is in the complexity you removed. Add pieces back one at a time until it breaks again. That's your culprit.
Step 3: Check Your System Prompt for Conflicts
System prompts accumulate contradictions over time. Classic examples:
- "Always be concise" + "Always explain your reasoning step by step"
- "Only use the tools provided" + later adding tools with overlapping purposes
- "Don't ask clarifying questions" + ambiguous tasks that genuinely need clarification
Run your system prompt through this mental test: if a smart human read these instructions and hit this specific situation, what would they do? If the answer is "be confused," the agent will be too.
Step 4: Isolate Tool Failures
If the agent uses external tools (web search, APIs, databases), test each one independently:
1. Call the tool with the exact input your agent would send 2. Check the raw response 3. Ask: "Would a reasonable agent know how to use this response?"
Common tool issues:
- API rate limits returning 429s that look like empty results
- JSON responses with nested errors buried where the agent won't see them
- Timeout responses that return partial data
- Schema mismatches (the tool returns
user_idbut the agent expectsuserId)
Step 5: Check Context Window Pressure
If your agent worked fine on short tasks but breaks on long ones, you're probably hitting context limits:
- Earlier instructions get "pushed out" of the effective context
- The agent starts confusing past tool results with current ones
- Summaries or compression kicks in and loses important details
Fix: add explicit "working memory" checkpoints. At key decision points, have the agent re-state what it knows before proceeding.
Step 6: Add Structured Logging (For Next Time)
Once you've fixed the immediate issue, add logging so the next failure is easier to diagnose:
Before each tool call: log {tool_name, input, timestamp}
After each tool call: log {output_summary, success/fail, latency}
At decision points: log {reasoning, chosen_action, alternatives_considered}Even a simple text log in your agent's workspace file makes debugging 10x faster.
Quick Diagnostic Checklist
Use this when an agent breaks and you don't know where to start:
- [ ] Did the model version change? (Provider updates can shift behavior)
- [ ] Did any tool/API update its response format?
- [ ] Did the system prompt change recently?
- [ ] Is the failure deterministic or intermittent?
- [ ] Does the failure happen with a fresh context (no prior messages)?
- [ ] Does the failure happen with a different model on the same prompt?
- [ ] Did any external dependency (auth tokens, rate limits) change?
If you check all of these and still can't find it, the issue is usually in a subtle prompt conflict or a tool returning a new edge-case value. Add more logging and run it again.
The "Fresh Eyes" Test
After you've been staring at an agent for too long, everything looks fine even when it isn't.
The fresh eyes test: explain what the agent is supposed to do to a rubber duck (or paste the system prompt into a new chat and ask Claude/GPT "what would confuse an AI following these instructions?"). You'll catch issues in 2 minutes that you missed for 2 hours.
When to Escalate vs. Fix Yourself
Fix yourself:
- Tool returning unexpected format → update your parsing or tool call
- Prompt conflict → rewrite the conflicting instructions
- Context window pressure → restructure the workflow to reduce context size
Ask in the Workshop:
- Fundamental agent design question (is this the right architecture?)
- Persistent reasoning failures across multiple models
- Complex multi-agent coordination failures
Resources
- Ask Patrick Library — battle-tested agent configs updated nightly
- Workshop tier — get direct answers about your specific setup
- Discord #troubleshooting — community debugging, live
Want the full playbook?
Get copy-paste AI templates, prompt frameworks, and agent patterns — all in one place.
Get Access — It’s FreeNo credit card. No fluff. Just the good stuff.