Agent Design 8 min read Tested March 2026

The Agent Handoff Protocol

Most agent guides tell you how to get agents to do more. Almost none tell you when they should stop. That gap has real consequences — from irreversible actions to $400 wasted on runaway API calls. Here are the four triggers that define when your agent must escalate, plus a template and safe-default system I use in production.

The Problem Nobody Talks About

An agent that never escalates will eventually take an irreversible action you didn't intend, silently fail while you think it's working, or spend a fortune looping on an unsolvable problem. An agent that escalates everything is useless. The skill is calibration.

Trigger 1

Irreversibility

Can a non-technical human undo this in under 5 minutes?

Trigger 2

Confidence Floor

Is confidence below 85%? Say so — don't fabricate certainty.

Trigger 3

Cost Spike

Consuming 3× expected tokens or API calls? Stop and report.

Trigger 4

Conflict Detection

Instructions contradict each other? Surface the conflict — don't silently pick one.

Trigger 1: Irreversibility Threshold

Rule: If an action cannot be undone in under 5 minutes by a non-technical human, escalate before executing.

Irreversible actions — always escalate:

Sending emails or DMs to external people
Publishing anything publicly (posts, articles, announcements)
Deleting files, records, or data
Spending money (subscriptions, API credits, purchases)
Changing DNS, domains, or hosting configurations
Modifying production databases

Reversible actions — proceed without asking:

Writing draft files
Reading and analyzing data
Creating internal notes or memory files
Running read-only API calls
Preparing content for human review

Add this to your SOUL.md:

## Escalation Rules
Before any irreversible action, ask explicitly:
"This will [action]. Confirm? (yes/no)"
Log every escalation with reason in memory/YYYY-MM-DD.md.

Trigger 2: Confidence Floor

Rule: If confidence in the correct answer is below 85%, say so. Do not fabricate certainty.

The confidence floor applies especially to factual claims about the world, technical recommendations, predictions, and anything you'd need to verify with a source you don't have.

❌ Bad

Fills the gap with plausible-sounding information. Uses hedging language to disguise fabrication: "something like...", "roughly..."

✓ Good

"I'm not confident enough to act on this. Here's what I know: [X]. Here's what I need to verify: [Y]. Want me to search for confirmation?"

Trigger 3: Cost Spike Detection

Rule: If a task is consuming 3× the expected tokens or API calls, stop and report before continuing.

This catches infinite loops (common with tool-use agents), tasks that expanded in scope unexpectedly, and runaway sub-agent spawns.

# Pseudo-code for cost spike detection
session_tokens = get_session_token_count()
expected_tokens = TASK_BASELINE_TOKENS

if session_tokens > expected_tokens * 3:
    pause()
    report_to_human(
        f"Cost spike detected. Used {session_tokens} tokens vs "
        f"{expected_tokens} expected. Proceed?"
    )

Trigger 4: Conflict Detection

If instructions conflict with each other, stop and surface the conflict. Do not pick one silently.

Examples that require escalation:

"Respond immediately to all messages" + "Only post during business hours"
"Keep costs low" + "Use the best model for everything"

Continue Reading — Library Members Only

This item includes the full escalation message template, safe defaults table, escalation log format, and the meta-rule for when not to escalate.

Copy-paste escalation message template
Safe defaults for every escalation type
What NOT to escalate (and why agents that over-escalate are broken)
Monthly escalation log format to eliminate repeat questions
76 more battle-tested items in the library

Get Library Access — $9/mo

Card checkout via Stripe · Cancel anytime · Read a free sample first