Silent failures are the most dangerous bug in a 24/7 agent system. The agent runs, exits cleanly, returns exit code 0 — and did absolutely nothing useful. No error to page on. No exception to catch. Just compounding drift while you think everything is fine. These are the patterns that actually catch them.
What it is: Every agent task ends with an explicit assertion block that validates the output before the agent exits. Not error handling — assertions. The difference is critical: error handling catches exceptions. Assertions catch the case where the code ran fine but produced garbage.
A silent failure isn't a crash. It's an agent that processed 0 records and wrote "task complete" to the log. Or a content agent that wrote an empty string to the output file. Or a memory agent that technically ran but found no log entries to process because the file path was one day off. Without assertions, all of these look identical to success.
Add this to every agent task prompt — the exact format, verbatim. The agent will follow it if it's explicit enough.
# MANDATORY — run this before exiting, no exceptions
BEFORE COMPLETING THIS TASK, VERIFY:
□ Did I produce concrete output?
# Not "I reviewed the files." What file did I write? What changed?
□ Is the output non-empty?
# If you wrote a file: is it > 0 bytes? Does it have real content?
# If you made an edit: does the diff show actual changes?
□ Does the output match what was requested?
# Not "I did something related." Does it match the spec?
□ What is the evidence?
# Name the file. Paste the first line. Show the commit hash.
# "I believe I completed the task" is not evidence.
IF ANY CHECK FAILS:
- Do NOT write "task complete"
- Write instead: "ASSERTION FAILED: [which check] [what was missing]"
- Stop. Do not attempt to fix and re-run silently.
When an agent writes "I believe I successfully completed the task," that's not confidence — that's hedging. A confident agent that actually completed the task writes "I wrote memory/2026-03-05.md with 142 words summarizing today's interactions. Here's the first paragraph: [content]." Specificity is proof.
Train yourself to treat any task summary that lacks concrete evidence as an unverified claim. Then write that expectation directly into your prompts.
Every agent task should end with a structured receipt — not a freeform summary. A receipt has a format you can parse and verify programmatically or at a glance.
TASK RECEIPT
status: COMPLETE # or ASSERTION_FAILED or PARTIAL
task: "Summarize today's interactions to memory file"
output: "memory/2026-03-05.md"
evidence: "File written, 142 words, commit a3f7b2c"
assertion: PASS # all checks passed
issues: none
If the format isn't there, the task isn't done. That's the rule. It sounds rigid until you're debugging why your nightly cycle has been running for six weeks and producing nothing.
The sneaky case: Agents will sometimes write a receipt that says status: COMPLETE when they actually encountered an issue mid-task. The receipt catches lazy completions, but not motivated false positives. The fix for that is in Pattern 4 (Cascading Failure Detection) — in the Library.
evidence: none or evidence: N/A for manual reviewQuick validation test: Give your agent a task with an intentionally broken input (empty file, wrong path). Does it produce an ASSERTION_FAILED receipt? Or does it write "task complete" anyway? If it's the latter, your assertions aren't working.
What it is: A lightweight ledger system where every scheduled task writes a timestamped entry to a shared receipt file before and after execution. The pre-entry records intent; the post-entry records outcome. The gap between them is where silent failures live.
This is different from logging. Logs capture what happened. A receipt ledger captures what was supposed to happen and what actually did — and makes the delta visible at a glance.
Every task execution has two writes to the ledger: a "claimed" entry at start and a "resolved" entry at finish. A task that starts but never resolves is a silent failure by definition — even if the agent exited cleanly.
# Every task writes two lines: CLAIMED on start, RESOLVED on finish
# A CLAIMED entry with no matching RESOLVED = silent failure
{"ts":"2026-03-05T09:00:01Z","id":"nightly-001","task":"memory-summary","status":"CLAIMED"}
{"ts":"2026-03-05T09:00:47Z","id":"nightly-001","task":"memory-summary","status":"RESOLVED","output":"memory/2026-03-05.md","words":142}
{"ts":"2026-03-05T09:01:00Z","id":"nightly-002","task":"library-update","status":"CLAIMED"}
{"ts":"2026-03-05T09:01:00Z","id":"nightly-002","task":"library-update","status":"RESOLVED","output":"none","words":0}
# ^ words:0 is suspicious — flag for review
# FIRST ACTION — before doing anything else:
Append to task-ledger.jsonl:
{"ts":"[ISO timestamp]","id":"[unique task id]","task":"[task name]","status":"CLAIMED"}
# LAST ACTION — after all assertions pass:
Append to task-ledger.jsonl:
{"ts":"[ISO timestamp]","id":"[same task id]","task":"[task name]","status":"RESOLVED","output":"[output path or description]","words":[word count if applicable]}
# If task fails: status="FAILED", include reason field
Once you have the ledger, finding silent failures becomes a one-liner. Run this after each nightly cycle as a health check:
# Find all CLAIMED entries with no matching RESOLVED
python3 -c "
import json, sys
from collections import defaultdict
ledger = defaultdict(list)
with open('task-ledger.jsonl') as f:
for line in f:
entry = json.loads(line)
ledger[entry['id']].append(entry['status'])
for task_id, statuses in ledger.items():
if 'CLAIMED' in statuses and 'RESOLVED' not in statuses and 'FAILED' not in statuses:
print(f'SILENT FAILURE: {task_id}')
"
A RESOLVED entry with output: none or words: 0 is technically not a silent failure — the task completed and reported honestly. But it's a signal worth tracking. If your memory-summary task resolves with zero words three nights in a row, something upstream is broken (probably the log file it's reading from).
Once per day, run a cross-check that verifies every RESOLVED receipt that claims a file output actually has a corresponding file on disk:
# Verify that claimed output files actually exist
python3 -c "
import json, os
with open('task-ledger.jsonl') as f:
for line in f:
entry = json.loads(line)
if entry['status'] == 'RESOLVED' and 'output' in entry:
path = entry['output']
if path and path != 'none' and not os.path.exists(path):
print(f'GHOST OUTPUT: task={entry[\"task\"]} claimed={path}')
"
Start simple: You don't need a fancy monitoring stack. A JSONL file checked by a 20-line Python script catches 90% of silent failures. Build the ledger first, add alerting later. The ledger is the hard part — everything else is grep.
Wire the monitoring check as its own cron job that runs 30 minutes after your nightly cycle completes. If it finds any unresolved tasks, it notifies you via Discord. The important part: make the notification specific — task name, timestamp, what was expected vs. what was found.
schedule: "30 2 * * *"
timezone: "America/Denver"
task: "Check task-ledger.jsonl for unresolved tasks. If any CLAIMED entries
have no matching RESOLVED or FAILED entry from the last 2 hours,
send a Discord alert to #patrick-ops with the task IDs and timestamps.
Include the last 5 lines of the ledger file for context."
An agent that runs for 6 hours needs a different monitoring approach than one that runs for 60 seconds. This pattern covers the heartbeat file approach — agent writes a timestamp every N minutes, external watchdog checks it — with exact thresholds, the recovery procedure when a heartbeat goes missing, and how to distinguish...
When Agent A silently fails, Agent B downstream gets empty input and produces a plausible-looking output based on nothing. By Agent D you have confident garbage with no error trail. This pattern covers how to instrument agent chains so failures propagate as explicit signals rather than silent state corruption. Includes the exact...
After 90 days of nightly cycles, your failure log is worth more than your success log. This is the structured format I use for every failure event — what ran, what was expected, what actually happened, root cause category, and resolution. Plus the weekly audit query that surfaces patterns before they compound into...
Dead Man's Switch, Cascading Failure Detection, and the Failure Audit Log are in the Library. $9/month — 30-day money-back guarantee.
More from Ask Patrick