Agent Design
8 min read
Tested March 2026
The Agent Handoff Protocol
Most agent guides tell you how to get agents to do more. Almost none tell you when they should stop. That gap has real consequences โ from irreversible actions to $400 wasted on runaway API calls. Here are the four triggers that define when your agent must escalate, plus a template and safe-default system I use in production.
The Problem Nobody Talks About
An agent that never escalates will eventually take an irreversible action you didn't intend, silently fail while you think it's working, or spend a fortune looping on an unsolvable problem. An agent that escalates everything is useless. The skill is calibration.
Trigger 1
Irreversibility
Can a non-technical human undo this in under 5 minutes?
Trigger 2
Confidence Floor
Is confidence below 85%? Say so โ don't fabricate certainty.
Trigger 3
Cost Spike
Consuming 3ร expected tokens or API calls? Stop and report.
Trigger 4
Conflict Detection
Instructions contradict each other? Surface the conflict โ don't silently pick one.
Trigger 1: Irreversibility Threshold
Rule: If an action cannot be undone in under 5 minutes by a non-technical human, escalate before executing.
Irreversible actions โ always escalate:
- Sending emails or DMs to external people
- Publishing anything publicly (posts, articles, announcements)
- Deleting files, records, or data
- Spending money (subscriptions, API credits, purchases)
- Changing DNS, domains, or hosting configurations
- Modifying production databases
Reversible actions โ proceed without asking:
- Writing draft files
- Reading and analyzing data
- Creating internal notes or memory files
- Running read-only API calls
- Preparing content for human review
Add this to your SOUL.md:
## Escalation Rules
Before any irreversible action, ask explicitly:
"This will [action]. Confirm? (yes/no)"
Log every escalation with reason in memory/YYYY-MM-DD.md.
Trigger 2: Confidence Floor
Rule: If confidence in the correct answer is below 85%, say so. Do not fabricate certainty.
The confidence floor applies especially to factual claims about the world, technical recommendations, predictions, and anything you'd need to verify with a source you don't have.
โ Bad
Fills the gap with plausible-sounding information. Uses hedging language to disguise fabrication: "something like...", "roughly..."
โ Good
"I'm not confident enough to act on this. Here's what I know: [X]. Here's what I need to verify: [Y]. Want me to search for confirmation?"
Trigger 3: Cost Spike Detection
Rule: If a task is consuming 3ร the expected tokens or API calls, stop and report before continuing.
This catches infinite loops (common with tool-use agents), tasks that expanded in scope unexpectedly, and runaway sub-agent spawns.
# Pseudo-code for cost spike detection
session_tokens = get_session_token_count()
expected_tokens = TASK_BASELINE_TOKENS
if session_tokens > expected_tokens * 3:
pause()
report_to_human(
f"Cost spike detected. Used {session_tokens} tokens vs "
f"{expected_tokens} expected. Proceed?"
)
Trigger 4: Conflict Detection
If instructions conflict with each other, stop and surface the conflict. Do not pick one silently.
Examples that require escalation:
- "Respond immediately to all messages" + "Only post during business hours"
- "Keep costs low" + "Use the best model for everything"
Continue Reading โ Library Members Only
This item includes the full escalation message template, safe defaults table, escalation log format, and the meta-rule for when not to escalate.
- Copy-paste escalation message template
- Safe defaults for every escalation type
- What NOT to escalate (and why agents that over-escalate are broken)
- Monthly escalation log format to eliminate repeat questions
- 76 more battle-tested items in the library
Get Library Access โ $9/mo
Card checkout via Stripe ยท Cancel anytime ยท Read a free sample first