When Your AI Agent Runs Away: 204 PRs, $900 Wasted, and the 3-Layer Fix
I woke up to 204 pull requests from a single autonomous agent running overnight. 12 hours, ~$900 in Bedrock tokens, 509 failed builds, zero features shipped. Prompt-only safeguards all failed. Here's the 3-layer fix — hard kill switch, atomic circuit breakers, drift observability — that now prevents runaway agents.
Table of Contents
- The Setup: An Autonomous Build Factory Running Unsupervised
- What Happened: 204 PRs in 12 Hours, Zero Features
- Root Cause: Why Prompt-Only Safeguards Fail at Scale
- Layer 1: The Kill Switch (Hard Circuit Breaker)
- Layer 2: Budget and Output Circuit Breakers
- Layer 3: Observability, Detecting Drift Before Damage
- The Cost of Learning: $900 and a Design Principle
- Decision Framework: When to Trust an Agent Unsupervised
I woke up to 204 pull requests.
Not over a week. Not from a team. From a single AI agent, running unsupervised overnight, on two of my web apps. Every PR merged automatically. Every merge triggered a build. Every build failed. Every failure triggered the agent to “fix” it. For twelve hours straight, my autonomous build factory had been eating its own tail – generating speculative fixes for problems it had created, each fix larger and more wrong than the last.
The final tally: 204 PRs. 509 consecutive failed builds. Approximately $900 in Bedrock tokens burned. Zero features shipped. Zero bugs fixed. The agent had been productive in the way a dog chasing its tail is productive – lots of motion, no progress.
This is the story of how I broke my own system, what it cost me, and the three-layer fix that now prevents it from happening again. If you’re running autonomous agents in production – or thinking about it – this is the failure mode nobody warns you about.
The Setup: An Autonomous Build Factory Running Unsupervised
Boulder is my personal project – a self-replicating AI build factory running on AWS Bedrock AgentCore. Fourteen autonomous agents, built with the Strands Agents SDK, that discover app ideas, validate markets, generate full-stack web applications, deploy them via AWS Amplify, and iterate on them autonomously through a backlog pipeline.
The architecture is a meta-orchestrator. An EventBridge cron triggers the autopilot agent every five minutes. The autopilot picks tasks from a backlog, spawns sub-agents (planner, implementer, reviewer), generates code, creates GitHub PRs, merges them, and monitors Amplify builds. When builds fail, it creates fix tasks and retries. The entire loop runs without human intervention.
Think of it as a CI/CD pipeline where the developer is also a robot. The robot writes code, reviews its own code, merges its own PRs, watches the build, and if the build breaks, writes more code to fix it. When this works, it’s beautiful – apps materialize from nothing while I sleep. When it doesn’t work, it’s a runaway process with root access to your codebase.
On the night of April 25, 2026, it stopped working.
What Happened: 204 PRs in 12 Hours, Zero Features
The trigger was mundane. A build failed on one of my apps – probably a dependency issue or a minor YAML syntax error in amplify.yml. The kind of thing a human fixes in two minutes with a one-line change.
The autopilot saw the failure and did what it was designed to do: generate a fix. But the fix was wrong. Not catastrophically wrong – just subtly wrong enough to fail the build in a new way. So the agent saw a new failure, generated a new fix, created a new PR, merged it, and watched the build fail again.
Here’s where it gets ugly. Each “fix” was larger and more speculative than the last. The agent, asked to fix a YAML syntax error in a 23-line amplify.yml, instead:
- Added 400 lines of
ROLLBACK_COMPLETEdetection scripts - Added TypeScript hard compilation gates
- Added lockfile regeneration safety nets
- Added Node version diagnostics
- Added a runtime patch script called
ensure-amplify-outputs.mjs
Every one of these “fixes” introduced the next failure. The actual fix – discovered after I killed the agent and looked at the damage – was to revert to the minimal 23-line amplify.yml. The diff that fixed everything: -409 / +4.
The agent had turned a one-line problem into a 400-line catastrophe, then spent twelve hours trying to dig itself out of the hole it had dug. On two apps simultaneously. Creating PRs at a rate of roughly one every three and a half minutes.
The stop-the-bleeding PRs tell the story. The biggest revert on one app was -5813 / +666 across 48 files. The agent hadn’t just broken the build – it had rewritten significant portions of the codebase in its attempts to “fix” things.
Root Cause: Why Prompt-Only Safeguards Fail at Scale
Before the incident, I had safeguards. The system prompt told the agent: “Maximum 5 PRs per session.” The planner had a PR_BUDGET_CAP. There was a MAX_FIX_DEPTH parameter. A retry cap.
All of them were prompt-only. The LLM was expected to self-discipline via its system prompt. It did not.
Here’s why this fails for autonomous agents running in stateless loops:
No state persistence between invocations. Each autopilot invocation was stateless. The EventBridge cron fired every five minutes, spawning a fresh invocation. The agent couldn’t “remember” it had already created 50 PRs because each invocation started from zero. The prompt said “max 5 PRs per session” – and technically, each invocation only created one or two. The cap was never violated within a single session. It was violated across sessions because nobody was counting.
LLMs don’t reliably self-discipline. Even within a single invocation, the agent would log “Task completed successfully” while the Amplify build was actively failing. It believed its own output instead of checking the actual outcome. A system prompt saying “don’t do X” is a suggestion, not a constraint. It’s the difference between a comment saying // don't exceed 5 and an actual if (count >= 5) return in the code.
The fix-loop is a stable attractor. Build fails, agent fixes, fix breaks something else, build fails again. This loop is self-reinforcing. There’s no natural exit condition. A human would step back after the third failed attempt and say “I’m making this worse.” An LLM in a stateless loop has no mechanism to reach that conclusion.
If your agent has a system prompt that says “don’t create more than N pull requests” or “stop after M retries” – and that constraint lives only in the prompt – you have no safeguard. You have a polite suggestion that works until it doesn’t. The night it doesn’t is the night you wake up to 204 PRs.
Layer 1: The Kill Switch (Hard Circuit Breaker)
The first thing I shipped – within hours of discovering the damage – was a dead-man switch. Not a prompt instruction. A hard, code-enforced gate that the agent cannot bypass because it runs before the agent’s reasoning even starts.
The mechanism is simple: an SSM Parameter Store value at /boulder/autopilot/enabled. The agent’s entry point reads this parameter before doing anything else. If the value isn’t 'true', the invocation exits immediately. No reasoning, no tool calls, no PRs.
# agent/shared/kill_switch.py
SSM_GLOBAL = "/boulder/autopilot/enabled"
SSM_APP = "/boulder/autopilot/{app}/enabled"
_CACHE_TTL_S = 60 # Don't hammer SSM on every 5-min invocation
def is_autopilot_enabled(app: str) -> bool:
global_val = _cached_get_param(SSM_GLOBAL)
app_val = _cached_get_param(SSM_APP.format(app=app))
return global_val.lower() == "true" and app_val.lower() == "true"
Two levels: global kill switch and per-app kill switch. Both must be 'true' for the agent to proceed. The 60-second cache prevents SSM throttling on the five-minute cron cycle. And critically – it fails open on SSM errors. If SSM itself is down, the agent runs. This is deliberate: I’d rather have a runaway risk than brick the entire system because of an AWS service hiccup.
To stop a runaway, I flip one SSM parameter. Takes three seconds from the AWS console or CLI. No deploy, no code change, no waiting for a build.
What this prevents: An agent that’s already in a bad state continuing to operate. The moment I notice something wrong – or the moment an alarm fires – I can kill it instantly without touching the codebase.
Layer 2: Budget and Output Circuit Breakers
The kill switch is reactive – I have to notice the problem and flip it. Layer 2 is proactive. It stops the agent before it causes damage, even if nobody’s watching.
Five circuit breakers, all backed by DynamoDB atomic counters (not prompt instructions):
| Breaker | Mechanism | Default |
|---|---|---|
| Mutex per app | DynamoDB conditional put; stale lock auto-expires after 15 min | 1 concurrent per app |
| Daily PR cap | Atomic counter dailyPrCount_YYYY-MM-DD | 5 PRs/app/day |
| Daily cost cap | Atomic counter dailyCost_YYYY-MM-DD | $10/app/day |
| Per-task retry cap | <!--attempts:N--> marker in backlog | 3 attempts then PERMANENTLY_FAILED |
| Semantic fix-loop detection | SHA-256 hash of sorted file list; same files touched N times triggers stop | 3 iterations max |
The implementation uses the Strands Agents SDK’s plugin system – a custom CircuitBreakersPlugin that hooks into BeforeToolCallEvent and AfterToolCallEvent lifecycle callbacks. Before the agent calls create_pull_request, the plugin checks the daily PR counter. If it’s at the cap, the tool call is blocked. After a successful PR creation, the counter increments atomically.
The semantic fix-loop detector is the clever one. It hashes the list of files the agent is about to modify. If the same set of files has been modified three times in the same day – meaning the agent keeps touching the same files over and over – it stops. This is exactly the pattern that caused the runaway: the agent modifying amplify.yml and its surrounding files in an endless loop.
Each breaker has an SSM override so I can raise limits for legitimate heavy-work days without redeploying. And the per-task retry cap uses a markdown comment in the backlog file itself – <!--attempts:3--> – so the state is visible and auditable in plain text.
What this prevents: The overnight runaway scenario entirely. Even if I’m asleep, the agent hits the 5-PR cap and stops. Even if the cap is somehow bypassed, the $10 cost cap catches it. Even if both fail, the fix-loop detector catches the repetitive file modification pattern. Defense in depth – one layer fails, another catches it.
Layer 3: Observability, Detecting Drift Before Damage
Kill switches and circuit breakers are binary – the agent is either running or it’s not. Layer 3 gives me the gradient: is the agent drifting toward trouble before it hits a hard limit?
Every autopilot invocation writes a structured record to a boulder-agent-executions DynamoDB table:
{
"agentName": "autopilot",
"startedAt": "2026-04-27T03:15:00Z",
"outcome": "COMPLETED",
"tokensIn": 45000,
"tokensOut": 12000,
"cost": "$1.35",
"prsCreated": 1
}
The UI surfaces this as a badge on each app’s build page: “3/5 PRs today, $4.20/$10 spent.” When the agent is capped, the badge turns red. There’s a manual “Reset caps today” button for when I intentionally want to let it run hot.
But the real value is pattern detection. If I see the agent creating 5 PRs every day for a week on the same app – hitting its cap repeatedly – that’s a signal. It means the agent is stuck on something it can’t solve. Without observability, that pattern is invisible. The agent just silently hits its cap and stops, and I never know it tried and failed five times.
The observability layer also records the outcome field: COMPLETED, SKIPPED (killed or capped), or FAILED (unhandled error). A spike in SKIPPED outcomes means the breakers are firing. A spike in FAILED means something unexpected is broken. Both are signals I want to see before they compound.
What this prevents: Slow-burn degradation. The agent that’s technically within its limits but producing garbage. The agent that’s been capped for two weeks and nobody noticed. The drift from “working autonomously” to “spinning uselessly” that only shows up when you look at the numbers.
The Cost of Learning: $900 and a Design Principle
The direct cost was approximately $700–900 in Bedrock tokens. Claude Opus isn’t cheap – at $15 per million input tokens and $75 per million output tokens, twelve hours of continuous agent invocations generating code adds up fast. No compute costs beyond inference – AgentCore handles the runtime – but the inference bill was real.
The indirect cost was higher. Two apps with corrupted codebases that needed multi-thousand-line reverts. A day of my time diagnosing and fixing. Trust erosion – after this, I second-guessed the autopilot for weeks.
But the design principle that emerged is worth more than the $900:
LLMs don’t reliably self-discipline. State and atomic operations do.
This is the Unix philosophy applied to agent safety. You don’t trust a process to limit its own resource consumption – you use ulimit. You don’t trust a service to rate-limit itself – you put a rate limiter in front of it. You don’t trust a user to not fill the disk – you set quotas. The constraint lives outside the thing being constrained, enforced by a mechanism the constrained thing cannot override.
Every prompt-only safeguard is the equivalent of a comment in your code that says // please don't exceed this limit. It works until it doesn’t. And when it doesn’t, you have no recourse because there was never an actual enforcement mechanism.
The iron rule I now follow: every new autonomous agent must have its own code-enforced breakers before it’s activated in production. No exceptions. The breakers ship before the feature.
Decision Framework: When to Trust an Agent Unsupervised
After living with this system for two weeks post-fix, here’s the three-question test I apply before letting any agent run without a human in the loop:
Question 1: Is the blast radius bounded?
Can the agent’s worst-case output be contained? If it can create unlimited PRs, modify unlimited files, or spend unlimited money – the blast radius is unbounded. You need hard caps before going unsupervised.
| Blast radius | Supervision needed | Example |
|---|---|---|
| Single file, single repo | Low – let it run | Linter auto-fix |
| Multiple files, single repo | Medium – cap output count | Code generation agent |
| Multiple repos, deploys to prod | High – hard caps + kill switch + observability | Build factory (Boulder) |
| External APIs, customer-facing | Critical – human approval gate | Email sender, payment processor |
Question 2: Is there a stable failure loop?
Can the agent’s output become its own input in a way that creates a self-reinforcing cycle? Build fails, agent fixes, fix fails, agent fixes again – that’s a stable loop with no natural exit. If yes, you need loop detection (semantic hashing, retry caps, or both).
Question 3: Can you kill it in under 60 seconds?
If you notice something wrong at 3 AM, can you stop the agent before it does more damage? If stopping requires a code deploy, a PR, or even an SSH session – your kill switch is too slow. SSM parameter flip, feature flag, or equivalent: something you can toggle from your phone.
If you answer “no” to any of these three questions, your agent isn’t ready to run unsupervised. Fix the gap first. The $900 lesson is cheaper to learn from my blog post than from your own AWS bill.
The honest recommendation: if you’re building autonomous agents today, assume they will run away. Any unsupervised loop without hard limits will eventually find its failure mode. Design your safety layers for the failure case, not the happy path. Ship the circuit breakers before you ship the feature. And never, ever trust a system prompt as your only line of defense against a process that runs while you sleep.
Never miss a post
Get notified when I publish new articles about AI, Cloud, and AWS.
No spam, unsubscribe anytime.
Comments
Sign in to leave a comment
Related Posts
AWS Agent Toolkit GA: How I Gave an Agent 15,000 AWS APIs Without Losing Sleep
AWS released the Agent Toolkit for AWS on May 6, 2026 -- a managed MCP server exposing the full AWS API surface to autonomous agents. I shipped an infrastructure agent the same week. Here's the two-phase safety pattern that lets you hand an agent the keys to your account without waking up to a $10K bill.
From RPA Bots to AI Agents — A 5-Criterion Scoring Framework for Enterprise Migration
Your RPA estate has 50 bots. Some should become AI agents, some should stay as bots, some need a hybrid pattern. Here is a repeatable, weighted scoring rubric — and the 5 migration patterns it maps to.
MCP Gateway as Policy Enforcement Point: RBAC for Your Agent's Tool Access
Your AI agent has access to tools that perform real actions -- approving expenses, querying databases, modifying infrastructure. Prompt-based guardrails don't survive adversarial inputs. Here's how AgentCore Gateway + Cedar policies create a deterministic enforcement layer that operates independently of the agent's reasoning.
