The hardest thing about running AI agents isn't setting them up. It's knowing when they've broken.
A crashed server throws an error. A failed email bounces. But an AI agent that's quietly gone wrong? It just keeps responding. It keeps sending summaries. It keeps filing things away. The output looks fine at a glance. The problems only show up weeks later, when you go looking for something it was supposed to catch — and it isn't there.
I've been running AI agents in production for over a year. In that time I've learned that the most dangerous failure mode isn't a crash. It's a drift. Here are the three signs I've learned to watch for.
You've stopped noticing it
Think about the last time your agent did something that made you think "oh good, it caught that." If you can't remember, that's the sign.
A working agent creates small moments of value you notice — a summary that saved you 20 minutes, a follow-up that happened without you thinking about it, a flag that showed up before a problem became urgent. When those moments stop, one of two things is true: either the agent stopped doing valuable work, or it's still doing work but you've stopped trusting it enough to act on it.
Both are failure states. The second one is worse, because you're still paying for API calls while getting zero benefit.
Set a calendar reminder every two weeks: "Did my AI do something useful this week?" If the answer is no, dig in. Either the task assignment was wrong from the start, or something broke along the way.
The last update timestamp is wrong
Every agent that writes to files, logs, or databases leaves a trail. If it's running, the trail is fresh. If it's not, the trail goes cold.
The simplest health check you can run:
ls -lt ~/agent-outputs/ | head -5
or if your agent logs to a file:
tail -5 ~/agent.log && date
Compare the timestamp on the last output to when the agent should have run. If your agent runs every morning at 7 AM and the last log entry is from three days ago — something broke. Maybe a cron job failed. Maybe an API key expired. Maybe a dependency updated and the import broke silently. The agent didn't tell you. The timestamp did.
Build timestamp checking into your morning routine. Thirty seconds. If the file is fresh, you're fine. If it's stale, investigate before it becomes a multi-day gap in your records.
Library Item #45 (the 5-minute morning health check) has a copy-paste script that does this automatically across multiple agents.
The summaries have gone generic
This is the subtlest sign, and the one most people miss entirely.
A well-configured agent producing a good summary sounds specific: "Three customer emails today, two about shipping delays on order #4872, one upgrade request from [name]. Flagged the shipping issue — warehouse team is backed up through the 10th."
An agent that has lost its context — either because its memory file grew too large, because the instructions got truncated, or because a prompt update broke its persona — produces summaries that sound like this: "Reviewed incoming messages. No urgent items identified. Monitoring continues."
That second summary sounds fine. It's not. It's the agent pattern-matching to what a summary should look like, rather than actually doing the work.
Read your agent's last five outputs and ask: could this have been written without reading the actual source material? If the answer is yes, the agent has drifted. Check its memory file for context overload (files over ~2,000 words start compressing poorly), verify the core instructions are still intact, and run a manual test with a real input to see what it produces.
The common thread: All three signs are things you can only catch by actively looking. Silent failures are silent by definition. Build in 5 minutes of active checking each morning and you'll catch 90% of issues before they become week-long gaps.
Why agents fail this way
Agents don't fail like software. Software either runs or it doesn't. Agents exist on a spectrum — fully working, partially working, technically running but producing garbage. The middle states are the dangerous ones.
The most common causes I've seen:
- API key expiration. Keys rotate, billing lapses, rate limits get hit. The agent tries to call the API, gets a 401, and either errors silently or falls back to a cached response that's weeks old.
- Context window overflow. Memory files grow. Once they exceed the model's effective context window, earlier instructions get deprioritized. The agent keeps running, but it's forgotten half its job.
- Instruction drift. Someone (maybe you, maybe another agent) updates the prompt. A single word change can flip a classification agent from "flag anything uncertain" to "only flag things you're sure about." The agent still runs. The output changes completely.
- Environmental changes. The file it reads moved. The format it parses changed. A dependency updated. The agent doesn't know — it tries, encounters something unexpected, and either silently skips or produces output based on a fallback it wasn't designed for.
None of these produce error messages you'll see without looking. That's the point. Build the habit of looking.
What a working agent looks like
For contrast: a well-monitored agent running production tasks will produce outputs that are specific and traceable. You should be able to read an agent's summary and confirm at least two or three claims against the source material within 30 seconds. If you can't, it's a sign the agent is summarizing its idea of what happened — not what actually happened.
The 5-minute morning check I run:
- Timestamp fresh? (under 25 hours for daily agents)
- Output specific? (names, numbers, real references — not generic patterns)
- Did anything get caught? (if the agent's job is to flag issues, did any flags fire this week?)
That's it. Three checks, five minutes, catches almost everything.
One more thing: If you find a problem, fix the root cause — not just the symptom. An agent that drifted because its memory file is too large will drift again in three weeks if you only clear the file. Add a compression step. If a key expired, set a calendar reminder 30 days before the next expiry. Patches that don't address root causes are just delayed failures.
The 5-Minute Morning Health Check
Library Item #45 is a copy-paste script that checks all your active agents in one run — timestamps, output freshness, API status, memory file sizes. Takes 5 minutes to set up, runs in 15 seconds each morning.
Get the Library — $9/mo Not ready to subscribe? Read a free sample first →Related reading: Why Your AI Agent Forgets Everything · Why Personal AI Fails (And Business AI Won't)