Why monthly caps don't protect you from one bad LLM run

May 2026 · 5 min read

A developer on r/SaaS posted this last week: "My AI agent ran overnight. Woke up to this." The screenshot showed a $498 API bill. He had a monthly cap set at $50.

The cap didn't fire. The bill did.

This is not a bug. It's how monthly caps work. And if you're building AI agents in production, it will happen to you too — unless you change the pattern.


The timeline of a bad run

Here's what happened in that $498 incident, reconstructed from what he described:

11:30pm — agent starts a research task. Fetches a URL. Gets a timeout. Retries. Gets another timeout. The retry logic calls the LLM to decide what to do next. The LLM decides to retry again. This repeats.

The monthly cap was $50. By midnight he'd burned through it. But the cap check runs on a billing cycle — not on each request. The agent kept running. By 7am it had made 4,800 API calls.

"The moment you're using Stripe as your safety net, you've already lost the run."

Monthly caps are accounting tools. They tell you what happened. They don't stop anything from happening.


Why the cap didn't fire

Most billing systems — OpenAI's included — check spend limits asynchronously. The request goes through first. The ledger updates after. By the time the cap logic runs, hundreds more requests have already been processed.

This is a fundamental property of post-hoc billing, not a bug you can patch. The cap will always lag behind the actual spend, especially during a loop that fires hundreds of requests per minute.

A $50/month cap and a $500 bill can coexist. They operate at different time scales.


The pattern that actually works: preflight

The fix is to check budget before the run starts — not after it finishes. This is called a preflight check.

Before your agent makes a single API call, you ask: does this customer have budget for this run? If not, you block it. The agent never starts. No tokens consumed. No bill generated.

from agentbill import AgentBillClient

client = AgentBillClient(api_key="agb_your_key")

# Before the agent runs — check budget
check = client.preflight(agent_id="researcher", budget=2.00)

if not check.approved:
    raise Exception(f"Run blocked: {check.reason}")

# Agent only runs if budget is confirmed
result = run_my_agent()

# Record actual cost after completion
client.record(agent_id="researcher", cost=check.estimated_cost)
  

Two calls. The agent either runs with a confirmed budget or it doesn't run at all. No overnight surprises.


Monthly caps vs. per-request ceilings

These solve different problems. A monthly cap is useful for overall budget visibility — you want to know your AI costs didn't triple this month. Fine.

A per-request ceiling is what protects you from a single bad run. It operates at the invocation level, before compute is consumed, with no lag between the check and the block.

You need both. The monthly cap catches drift. The preflight ceiling catches catastrophe.


The $498 run, replayed with preflight

Same agent. Same retry bug. Same overnight run.

First invocation: preflight checks budget. $2.00 ceiling. Approved — budget exists. Agent runs. Finishes. Cost recorded.

Second invocation (the retry loop): preflight checks again. Previous run already consumed the budget for this session. Blocked. Agent never starts.

Total bill: $2.00. Not $498.

The retry bug still exists. But it can't compound into a runaway loop when each invocation requires a budget check to proceed.


Implementing preflight in your stack

The pattern works regardless of what's inside your agent — LangChain, OpenAI Agents SDK, AutoGen, custom chains. You're wrapping the invocation, not the internals.

Python:

pip install agentbill-sdk
from agentbill import AgentBillClient

client = AgentBillClient(api_key="agb_your_key")

def run_agent_safely(customer_id: str, task: str):
    check = client.preflight(
        agent_id="my_agent",
        budget=2.00,
        customer_id=customer_id
    )
    if not check.approved:
        return {"blocked": True, "reason": check.reason}

    result = run_my_agent(task)
    client.record(agent_id="my_agent", cost=check.estimated_cost, customer_id=customer_id)
    return result
  

Node.js:

npm install agentbill
import { AgentBillClient } from 'agentbill'

const client = new AgentBillClient({ apiKey: 'agb_your_key' })

async function runAgentSafely(customerId: string, task: string) {
  const check = await client.preflight({ agentId: 'my_agent', budget: 2.00, customerId })
  if (!check.approved) return { blocked: true, reason: check.reason }

  const result = await runMyAgent(task)
  await client.record({ agentId: 'my_agent', cost: check.estimatedCost, customerId })
  return result
}
  

Summary

Monthly caps are accounting. Preflight checks are protection. One tells you what happened; the other prevents it from happening.

If you're running AI agents in production — especially agents that loop, retry, or run unattended — you need a check that fires before the first token, not after the last one.

Add preflight to your agents

Free tier: 1,000 preflight calls/month. No credit card required.

Get your API key

Related guides

How to limit cost per agent run How to add billing to a LangChain agent How to add a spend ceiling to an OpenAI agent