Adding billing to a LangChain agent takes two calls: one before the chain runs, one after. No middleware, no monkey-patching. Works with any LangChain component — LCEL chains, AgentExecutor, RetrievalQA, or custom runnables.
pip install agentbill-sdk langchain-openai langchain-core
The explicit pattern. Check budget before the chain runs, record units after it completes.
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from agentbill import AgentBillClient # ceiling=50: block any run estimated at more than 50 units client = AgentBillClient(api_key="agb_your_key", ceiling=50) def run_research_agent(customer_id: str, topic: str) -> str: # 1. Preflight — block before any tokens are consumed check = client.preflight( agent_id="research_chain", estimated_units=10, customer_id=customer_id ) if not check.approved: raise Exception(f"Blocked for {customer_id}: {check.reason}") # 2. Run the LangChain chain normally (LCEL syntax) llm = ChatOpenAI(model="gpt-4o") prompt = ChatPromptTemplate.from_template("Research this topic in depth: {topic}") chain = prompt | llm result = chain.invoke({"topic": topic}) # 3. Record units used client.record(agent_id="research_chain", units=10, customer_id=customer_id) return result.content
The @client.gate() decorator handles preflight and record automatically. Zero boilerplate inside the function.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from agentbill import AgentBillClient
client = AgentBillClient(api_key="agb_your_key", ceiling=50)
@client.gate(agent_id="research_chain", estimated_units=10, customer_id="user_123")
def run_research_agent(topic: str) -> str:
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Research: {topic}")
chain = prompt | llm
return chain.invoke({"topic": topic}).content
# preflight runs before, record runs after — automatically
result = run_research_agent("quantum computing")
For agents that run many steps, use checkpoint() to enforce a ceiling mid-run. The agent is blocked if it has already consumed too many units.
from agentbill import AgentBillClient
client = AgentBillClient(api_key="agb_your_key")
def run_multi_step_agent(customer_id: str, tasks: list) -> list:
client.preflight(agent_id="multi_step", estimated_units=len(tasks), customer_id=customer_id)
results = []
for i, task in enumerate(tasks):
result = run_single_task(task)
results.append(result)
# Check mid-run — stop if ceiling is hit
cp = client.checkpoint(
agent_id="multi_step",
units_so_far=i + 1,
ceiling=20,
customer_id=customer_id
)
if not cp.approved:
break # stopped early — no runaway cost
client.record(agent_id="multi_step", units=len(results), customer_id=customer_id)
return results
from agentbill import AgentBillClient, BudgetExhaustedError, CeilingExceededError, FreeTierExceededError
try:
result = run_research_agent("user_123", "quantum computing")
except CeilingExceededError:
return {"error": "run exceeds your per-request ceiling"}
except BudgetExhaustedError:
return {"error": "customer budget exhausted — top up to continue"}
except FreeTierExceededError as e:
return {"error": "free tier limit reached", "upgrade_url": e.upgrade_url}
AgentBill wraps at the invocation level — it doesn't care what's inside the chain. Use it with:
LLMChain AgentExecutor RetrievalQA ConversationalChain LangGraph
Pass customer_id to enforce separate budgets per user. Each customer has their own usage counters and free tier allowance.
# Different customers — isolated budgets
check_alice = client.preflight(agent_id="research", estimated_units=10, customer_id="alice")
check_bob = client.preflight(agent_id="research", estimated_units=10, customer_id="bob")
For LangGraph workflows, call preflight() before entering the graph and record() after the final node completes. Use checkpoint() inside nodes to enforce ceilings mid-graph.