Skip to content

Pydantic-AI budget control with SpendGuard

Your Pydantic-AI agent calls agent.run("...") and the run loop dispatches Model.request() repeatedly — once per step, once per retry, once per multi-step tool loop. Without a gate, every iteration is a free shot at the provider. SpendGuard wraps the model so each request() reserves against a budget before the upstream LLM call ships.

Why you'd want this

  • Pre-call enforcement, not post-hoc dashboards. Reservation happens before the OpenAI/Anthropic call. Over-budget calls raise DecisionStopped and the upstream request never goes out.
  • Retry-safe idempotency. Pydantic-AI re-enters request() on transient errors. SpendGuard derives a stable idempotency_key from messages + settings + run_id, so the retry collapses onto the original reservation instead of allocating a new one.
  • Tool loops stay budgeted. Multi-step tool-using agents are gated on every model call, including steps spawned by tool output.
  • Audit trail. Every decision (allow / stop / require_approval / degrade) is signed and chained for post-hoc analysis.
  • Human-in-the-loop approval. Pause-and-resume with await e.resume(client) when a contract fires REQUIRE_APPROVAL.

Setup (60 seconds)

pip install 'spendguard-sdk[pydantic-ai]'

You also need a running SpendGuard sidecar reachable on a Unix Domain Socket. The fastest path is the demo stack:

git clone https://github.com/m24927605/agentic-spendguard.git
cd agentic-spendguard && make demo-up

The demo binds the sidecar UDS at deploy/demo/runtime/uds/adapter.sock.

Wire it up

import asyncio

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

from spendguard import SpendGuardClient, new_uuid7
from spendguard.integrations.pydantic_ai import (
    RunContext, SpendGuardModel, run_context,
)
from spendguard._proto.spendguard.common.v1 import common_pb2


async def main() -> None:
    client = SpendGuardClient(
        socket_path="/var/run/spendguard/adapter.sock",
        tenant_id="00000000-0000-4000-8000-000000000001",
    )
    await client.connect()
    await client.handshake()

    guarded = SpendGuardModel(
        inner=OpenAIModel("gpt-4o-mini"),
        client=client,
        budget_id="my-budget",
        window_instance_id="my-window",
        unit=common_pb2.UnitRef(
            unit_id="usd_micros",
            token_kind="usd_micros",
            model_family="gpt-4",
        ),
        pricing=common_pb2.PricingFreeze(pricing_version="2025-q4"),
        claim_estimator=lambda messages, settings: [
            common_pb2.BudgetClaim(
                budget_id="my-budget",
                window_instance_id="my-window",
                amount_micros=1_000_000,  # 1 USD reservation per call
            )
        ],
    )

    agent = Agent(model=guarded)
    async with run_context(RunContext(run_id=str(new_uuid7()))):
        result = await agent.run("Hello")
        print(result.output)


asyncio.run(main())

What you get

  • Pre-call budget reservation. The wrapped model raises DecisionStopped instead of calling the LLM when the reservation would exceed the budget.
  • Signed audit chain. Every decision is recorded in the ledger with a cryptographic signature; replay-safe via the audit_outbox transactional pattern.
  • Approval continuation. When a contract fires REQUIRE_APPROVAL, the exception carries e.resume(client) — call it after an operator approves in the dashboard.

Common patterns

Per-tenant budgets

Pass distinct budget_id / window_instance_id values per tenant. The control plane API (POST /v1/budgets) provisions budgets without restarting the agent.

Handling approvals

from spendguard import ApprovalRequired

try:
    result = await agent.run(prompt)
except ApprovalRequired as e:
    await wait_for_operator_approval(e.decision_id)
    result = await e.resume(client)

Testing without burning tokens

Replace OpenAIModel with pydantic_ai.models.test.TestModel. The SpendGuard wrapper still records reservations and decisions, so you can unit-test budget logic without provider keys.