I've run agents in production long enough to watch most of the popular patterns fail in ways tutorials never cover. These are the five that held up. Two are fully free below — implementation details, failure modes, real configs. Three are in the Library.
What it is: A scheduled cron agent that runs at a fixed time each night, reviews the day's operations, identifies one concrete improvement, applies it, and documents the change. The key word is one.
Most people who try this build agents that attempt to improve everything. Those agents thrash. They generate enormous context, make marginal edits everywhere, and produce logs that are impossible to audit. The discipline of one improvement per cycle is what makes this pattern compound over time.
The loop runs as a cron job. It reads today's operational logs, picks a single improvement target, makes the change to a config file or template, commits it to git, and writes a one-paragraph summary to a nightly log file. That's it.
# Fires at 2:00 AM MT every night
schedule: "0 2 * * *"
timezone: "America/Denver"
task: "nightly-improvement-cycle"
model: "claude-opus-4"
thinking: "medium"
# nightly-improvement-cycle
You are running the nightly self-improvement cycle.
Step 1: Read memory/YYYY-MM-DD.md (today's log).
Step 2: Identify ONE concrete thing to improve.
Not two. One. The smallest meaningful improvement.
Step 3: Apply it. Edit the file. Commit to git.
Step 4: Write to memory/nightly-YYYY-MM-DD.md:
- What you changed
- Why
- What you expect to improve
- How you'll know it worked
Constraints:
- Do not refactor entire files
- Do not make changes you cannot test tonight
- If no clear improvement exists, write that and exit cleanly
- Quality bar: would I be proud of this in the morning report?
An agent trying to improve 10 things simultaneously will produce 10 mediocre edits with no clear accountability. One improvement means: one hypothesis, one change, one measurable outcome. After 30 nights, you have 30 documented improvements. After 90 nights, the compounding becomes real.
What makes this testable in <30 minutes: Create a test log file, run the agent manually with a specific task ("improve the error handling in X"), and verify it commits exactly one change with a coherent commit message and nightly log entry.
Don't run this with a cheap model. The nightly improvement loop requires judgment — recognizing the difference between "this is a real improvement" and "this is an inconsequential change that looks like progress." Claude Haiku will generate commits. They won't be improvements.
What it is: An orchestrator agent that delegates specific tasks to specialist sub-agents, waits for results, synthesizes them, and makes decisions. The failure mode everyone hits is treating this like a function call graph. It's not. It's a management structure.
The difference matters because sub-agents fail. They hallucinate, time out, return partial results, or complete the wrong task. Your orchestrator needs to handle this like a manager handles an employee who didn't deliver — not like a program handling a null pointer.
Think of your orchestrator as a CEO and your specialists as direct reports. The CEO gives direction, checks results, follows up when things go wrong, and never assumes a task was completed correctly just because no error was thrown.
role: orchestrator
responsibilities:
- task decomposition
- specialist assignment
- output validation
- synthesis and decision
delegation_rules:
- assign one clear task per specialist
- specify expected output format explicitly
- validate output before using it downstream
- if specialist output is incomplete, reassign — do not patch
failure_handling:
- timeout: reassign to same specialist with explicit constraints
- wrong_output: correct the task spec, not the output
- partial_result: determine if partial is usable before continuing
anti_patterns:
- do NOT edit specialist output directly
- do NOT continue downstream with unvalidated results
- do NOT spawn more than 3 concurrent specialists
The single biggest failure mode in multi-agent systems is the orchestrator accepting specialist output without checking it. This produces cascading errors that are nearly impossible to debug because the failure happens two steps downstream from the root cause.
# After receiving any specialist output, validate before proceeding:
validation_checklist:
- Does the output match the requested format?
- Are all required fields present?
- Does the content make sense given the input?
- Any signs of hallucination? (confident specific claims with no source)
- If any check fails: log the failure, reassign the task, do NOT continue
validation_log_format:
"VALIDATION [pass|fail] — specialist: {name} — task: {task} — issue: {issue if fail}"
Every tutorial shows you spinning up 10 parallel agents. In practice, once you exceed 3 concurrent specialists, you lose the ability to synthesize results coherently. The orchestrator's context fills with parallel outputs, synthesis quality degrades, and you end up with a lowest-common-denominator result. Three is the production ceiling.
Quick test: Intentionally give your specialist an ambiguous task. Does the orchestrator catch the malformed output, or does it downstream garbage? If it proceeds with garbage, your validation logic is broken.
Specialist orchestration is the right choice when: tasks are genuinely independent (can run in parallel), tasks require different expertise (writing vs. research vs. code), and results need synthesis before acting. It's the wrong choice when tasks are sequential by nature — in that case, use a single agent with a task list, not a multi-agent hierarchy.
How to keep long-running agents from degrading as context grows — specific token thresholds, compression triggers, and the summarization prompt format that doesn't lose critical state. Includes configs for 3 different agent lifetimes: single session, multi-day, and indefinite...
When a tool call fails mid-task, most agents either retry infinitely or abort the entire task. This pattern covers 5 specific failure types (auth failure, timeout, rate limit, malformed response, network error) and the exact handling logic for each. The difference between an agent that...
Why single-file memory architectures fail for agents running longer than 7 days — and the two-file solution (raw log + curated long-term) that scales indefinitely. Includes the exact memory maintenance prompt I use for the nightly review cycle...
Context Budget Management, Graceful Degradation, and the Dual-Write Memory Pattern are in the Library. $9/month for the full library — 30-day money-back guarantee.