Debugging All Levels 15 min triage Tested March 2026

How to Debug a Silent Agent Failure at 3am

You wake up, check your phone, and something's wrong. Your AI assistant was supposed to send a report at midnight. It didn't. Or maybe it ran, but nothing happened. No error. No message. Just silence.

This is the worst kind of failure. A crash with an error message is easy β€” you fix the error. But when your agent just… stops? That's a puzzle. And at 3am, you don't have patience for puzzles.

Here's a step-by-step triage guide. Work through it in order. Most silent failures fall into one of five buckets, and this guide will get you to the cause in under 15 minutes.

Step 1: Confirm It Actually Ran

Before you assume something broke, confirm the agent actually triggered at all.

Check your scheduler logs. If you're using a cron job, run this:

grep CRON /var/log/syslog | tail -50

Or on a Mac:

log show --predicate 'subsystem == "com.apple.xpc.launchd"' --last 2h | grep -i cron

If you're using a task scheduler or automation platform, check its run history β€” most have a dashboard that shows "last ran at" timestamps.

What you're looking for: Did the scheduler fire? If yes, move to Step 2. If no, the problem is in your scheduling β€” not your agent. Jump to Step 5.

Step 2: Find the Last Log Entry

Your agent should be writing logs somewhere. Find the most recent entry and read it carefully.

If logs go to a file:

tail -100 /path/to/your/agent.log

If logs go to a service like systemd:

journalctl -u your-agent-name --since "3 hours ago"
What you're looking for: The last thing the agent did before going quiet. Did it start a task and not finish? Did it receive data but not act on it? That last log line is your clue. If there are zero logs from the expected run time, the process never started. Go to Step 5.

Step 3: Check for Silent Crashes

Sometimes a process starts, hits a problem, and exits without writing anything useful. This is common when:

Check exit codes if you have them. In a shell script:

echo "Exit code: $?"

For processes managed by systemd:

systemctl status your-agent-name

Look for lines like Main PID: 1234 (dead) or code=exited, status=1.


Step 4: Test the AI Model Connection

A huge percentage of silent failures come from the AI model being unreachable. API keys expire. Services go down. Rate limits hit. And often, the agent just quietly fails without a clear message.

Test your connection directly:

Steps 4–7 + Triage Checklist

The remaining steps cover the API connection test, scheduler diagnosis, the manual run-and-watch technique, and the dead man's switch that makes silent failures loud. Plus a copy-paste checklist for next time.

  • Step 4: Test AI model connection (curl commands for Anthropic, OpenAI)
  • Step 5: Diagnose cron/scheduler setup issues
  • Step 6: Run manually with full output capture
  • Step 7: Add a dead man's switch with healthchecks.io
  • Copy-paste triage checklist for your next 3am incident
Get Library Access β€” $9/mo β†’

30-day money-back guarantee. Cancel anytime.

← Back to Library