Agent Budget Protocol introduces a novel solution for AI agent budget management. By implementing atomic reservations and run-scoped ceilings, it prevents unexpected budget exhaustion during autonomous sessions. This protocol allows seamless integration as middleware, ensuring AI operations remain within financial limits while maintaining performance.
Agent Budget Protocol is designed to establish a real-time budget decision interface for AI agent operations. This draft RFC is open for feedback from developers and engineers alike.
Identifying the Problem
AI agents operate in iterative loops, resulting in exponentially increasing costs due to repeated context resends. Current budget frameworks, which limit expenditures by monthly usage tied to API keys or user accounts, can lead to unexpectedly high costs during autonomous sessions, potentially exhausting a monthly budget within an hour. Crucially, existing systems fail to enforce limits on a per-run basis, leaving users vulnerable.
Proposed Solution
This RFC proposes a comprehensive budget authority that functions in front of provider calls, offering flexibility as a gateway hook, sidecar, or SDK middleware:
- Atomic Reservation: This mechanism ensures that a reservation is made for the estimated cost before each call. The process includes reserving the estimated cost, executing the call, committing the actual costs, and refunding any excess, thus preventing concurrent tool calls from exceeding set ceilings.
- Run-Scoped Ceilings: These ceilings will be binding to identities while also maintaining user and key scopes.
- Fail-Closed Pricing: This model denotes that a route with an unknown price is unroutable, effectively ensuring it is never free.
- Machine-Readable Budget Protocol: This feature utilizes response headers and adheres to RFC 9457 problem details, allowing agents to switch to more cost-effective, capability-valid models when budgets decrease.
- Staged Adoption: The framework accommodates gradual implementation, starting from advisory warnings to eventual enforcement of ceilings.
For comprehensive details, refer to the full document: RFC.md.
Current State
This protocol is at Draft v3, with active requests for feedback, particularly from platform and infrastructure engineers who are deploying large language model (LLM) gateways in real-world environments. Input can be provided in the Discussions, especially regarding the open questions outlined in section 7 of the RFC.
Relation to Existing Tools
The Agent Budget Protocol is not a competing model gateway or provider abstraction. Its first intended implementation is as a pre-call hook for LiteLLM, integrating seamlessly with current deployments. This design stems from experiences with ModelMuxer, a previously established LLM router that includes pre-call budget gating, which has been maintained since 2025. The current RFC redesigns the budgeting layer to better accommodate the actual breakdowns observed in production agent workloads.
No comments yet.
Sign in to be the first to comment.