How preflight avoids double-billing under concurrent load

May 2026 · 6 min read

A developer on Reddit asked a sharp question about AgentBill's checkpoint pattern: "Most checkpoint patterns I've seen either re-meter or skip metering and lose accuracy. How does the read-only check stay consistent with the final settlement?"

It's the right question. The naive implementation of a preflight check has a race condition that causes exactly this problem. Here's how AgentBill solves it.


The problem: read-check-approve is broken under concurrency

The obvious implementation of a preflight check looks like this:

# Naive implementation — DO NOT use in production
def preflight(customer_id, estimated_units):
    customer = db.query("SELECT used_units, limit_units FROM customers WHERE id = ?", customer_id)
    remaining = customer.limit_units - customer.used_units

    if estimated_units > remaining:
        return {"approved": False}

    return {"approved": True}
  

This reads the current balance, checks if the run fits, and returns a decision. Under a single serial workload it works fine.

Under concurrent load it breaks. Consider two agent runs starting at the same millisecond for the same customer who has 10 units remaining, each estimating 8 units:

Thread A: reads remaining = 10. 8 <= 10. Approved.
Thread B: reads remaining = 10. 8 <= 10. Approved.

Thread A runs. Uses 8 units. Used = 8.
Thread B runs. Uses 8 units. Used = 16. Limit exceeded.
  

Both reads happen before either write. Both see the same balance. Both get approved. The customer burns 16 units against a 10-unit budget. The check was useless.

This is a classic TOCTOU race: Time Of Check, Time Of Use. The check and the use happen at different times, and the state can change between them.


The fix: atomic reservation

AgentBill doesn't just read the balance — it reserves units atomically inside a transaction. The preflight UPDATE only succeeds when there's enough budget remaining:

-- This is what happens inside AgentBill's preflight
UPDATE customers
SET reserved_units = reserved_units + :estimated_units
WHERE account_id = :account_id
  AND customer_ref = :customer_ref
  AND (
    limit_units IS NULL
    OR used_units + reserved_units + :estimated_units <= limit_units
  )
RETURNING limit_units, used_units, reserved_units
  

If budget is available, the UPDATE succeeds and returns the updated row. The reservation is now reflected in reserved_units — visible to every subsequent transaction.

If budget is exhausted, the WHERE clause matches 0 rows. The UPDATE returns nothing. The run is blocked. No budget was consumed.

Replaying the concurrent scenario:

Thread A: UPDATE adds 8 to reserved_units. reserved = 8. Succeeds.
Thread B: UPDATE tries to add 8. used + reserved + 8 = 16 > 10. WHERE fails. Blocked.

Thread A runs. Completes. record() converts reserved → used.
  

The database handles the serialization. No application-level locking required.


Settlement: converting reserved to used

After the agent run completes, record() settles the reservation:

UPDATE customers
SET used_units     = used_units + :actual_units,
    reserved_units = reserved_units - :estimated_units
WHERE account_id = :account_id
  AND customer_ref = :customer_ref
  

The reserved units come out. The actual units go in. The net balance reflects reality.

If actual_units differs from estimated_units — say you estimated 10 but the run used 7 — the difference is released back into available budget. No manual adjustment needed.


What happens when a run fails

If the agent crashes or the caller never calls record(), the reserved units stay reserved indefinitely. That would permanently lock budget — a leak.

AgentBill handles this with a reservation expiry. Each reservation carries a timestamp. On the next preflight call for that customer, expired reservations are cleared before the budget check runs:

-- Clear stale reservations before checking budget
UPDATE customers
SET reserved_units = 0
WHERE account_id = :account_id
  AND customer_ref = :customer_ref
  AND reservation_expires_at < NOW()
  

This means a crashed run releases its reserved budget on the next invocation. The customer isn't permanently locked out because a single run failed to settle.


Why this matters for metering accuracy

The developer's question was specifically about consistency between the check and the settlement. The reservation pattern guarantees this in three ways:

1. No double-approval. The atomic UPDATE ensures only one concurrent run can claim a given unit of budget. The database is the lock.

2. No phantom budget. Every approved run immediately reduces the available budget visible to subsequent runs. There's no window where the same units appear available twice.

3. Accurate settlement. The record() call replaces estimated with actual. The reservation was a claim, not a charge. The charge happens at settlement with the real number.


The full flow

preflight(estimated_units=10)
  → atomic UPDATE reserves 10 units
  → returns approved=true, remaining_units=N

agent runs (actual cost: 7 units)

record(units=7)
  → used_units += 7
  → reserved_units -= 10
  → net: 7 charged, 3 released
  

If two runs start simultaneously, only one can atomically claim the budget. The other is blocked at the database level before any compute runs.


Add preflight to your agents

Free tier: 1,000 preflight calls/month. No credit card required.

Get your API key

Related

Why monthly caps don't protect you from one bad LLM run How to limit cost per agent run