Skip to content

fix(billing): audit P0 contract changes — pricing Option, BillingMode, DebitOutcome#463

Merged
FrozenArcher404 merged 7 commits into
mainfrom
fix/audit-billing-and-pricing
May 16, 2026
Merged

fix(billing): audit P0 contract changes — pricing Option, BillingMode, DebitOutcome#463
FrozenArcher404 merged 7 commits into
mainfrom
fix/audit-billing-and-pricing

Conversation

@FrozenArcher404
Copy link
Copy Markdown
Contributor

Six commits supporting a follow-up bitrouter-cloud audit-fix PR. All semver-affecting changes are in this PR so cloud's dep bump can land cleanly after the next release.

What's in here

  • fix(pricing): return Option when a required rate is missing (dedf41c)
    bitrouter_core::pricing::calculate_cost and the bitrouter_api::mpp::calculate_usage_cost wrapper now return Option<f64>. The previous unwrap_or(0.0) short-circuit silently undercharged whenever a token bucket had nonzero usage but no rate (a provider with input pricing and output_tokens: {} placeholder, for instance, billed every output token at $0). The four /v1/chat/completions-shaped filter sites log pricing_unavailable and skip the deduct when None comes back — the recommender's skip-incomplete change on the cloud side is what should keep that from happening in practice.

  • feat(pricing): add ModelPricing::is_complete and reserved output buckets (95b64cf)
    ModelPricing::is_complete() is the "safe to bill" predicate the cheapest-picker / recommender call to refuse placeholder entries. Adds reserved OutputTokenPricing.image / .audio fields (serde-round-tripped) so multimodal billing has a place to land without another breaking change.

  • feat(billing): explicit BillingMode on RoutingTarget (1b68281)
    The filters previously inferred "is this a BYOK request?" from target.api_key_override.is_some(). Cloud-routed targets always carry a platform key, so that inference treated every cloud request as BYOK and the deduct gate was unconditionally skipped. New BillingMode { Cloud, Byok } field on RoutingTarget; defaults to Cloud so a forgotten field bills (safer than silently leaking). All four filters now read this field. All in-tree RoutingTarget constructors get an explicit billing_mode: BillingMode::default().

  • feat(billing): deduct returns DebitOutcome (Settled | Deferred) (65efb60)
    Extends the PaymentGate::deduct contract. Deferred lets a gate signal "couldn't draw right now but recorded the debt for later collection" — callers MUST treat it as success, the response goes back unchanged. The credits-backed cloud gate uses this to insert a pending_debits row instead of failing the response. MppState's impl maps its existing success path to Settled (channels can't defer).

  • feat(mpp): export DebitOutcome from the public mpp module (01ee02c)
    One-liner to make the new enum reachable from downstream cloud crates.

  • fix(stream): bill estimated output on mid-stream client disconnect (720bd77)
    StreamObservation::outcome used to return Err whenever the client disconnected mid-stream, routing to on_request_failure with no billing — an attacker who drained a long generation and hung up before the Finish event paid nothing. Now: when Finish landed first, settle with its authoritative usage even if disconnect happened after; when disconnect beats Finish, return a synthesized usage with output tokens estimated from delta.len() / 4 so the request bills something. Input billing on disconnect is a known gap (prompt not plumbed through StreamObservation).

Compat notes

Three of these are technically API-breaking and want a minor bump:

  • RoutingTarget gains a billing_mode field — any downstream literal constructor needs to add it. Cushioned by BillingMode::default() returning Cloud.
  • calculate_cost / calculate_usage_cost now return Option<f64> — callers must handle None. Only known consumer is bitrouter-cloud.
  • PaymentGate::deduct returns Result<DebitOutcome, _> — trait impls update.

bitrouter-observe::ModelSpendObserver is updated in-tree to keep working with the new Option return (logs pricing_unavailable, writes cost=0 to the spend log rather than crashing).

Test plan

  • cargo nextest run — 968/968 pass
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo fmt -- --check — clean
  • New unit tests in bitrouter-core::pricing::tests (Option / partial pricing / unused-bucket cases)
  • New unit tests in bitrouter-api::router::stream_observation_tests (disconnect after Finish, disconnect with deltas, disconnect with no output)
  • Integration / staging snapshot — paired with the bitrouter-cloud follow-up PR before either lands in production

🤖 Generated with Claude Code

takasaki404 and others added 6 commits May 17, 2026 02:57
calculate_cost previously unwrap_or(0.0)'d every absent rate, so a
provider with only input pricing (no output_tokens.text) would bill
every output token at $0 — silent undercharging. Callers had no way
to distinguish "free model" from "we don't know how to bill this".

calculate_cost (and the bitrouter-api mpp wrapper) now return
Option<f64>. None means: at least one token bucket has nonzero usage
and the matching rate is missing. The four chat/completions filter
sites log pricing_unavailable and skip the deduct (response still
goes out — the recommender skip-incomplete change is what should
keep this from happening in practice). The spend-log observer
records cost=0 and warns, since dashboard accuracy degrades but
shouldn't block the request.

Behaviour preserved when usage is genuinely empty (Some(0.0)) and
when an unused bucket's rate happens to be missing (Some(rest)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
is_complete returns true when no_cache + text are both set — the
minimum a provider needs to be billable. Cleanest predicate for the
recommender and the /v1/models cheapest-picker to refuse entries
with placeholder pricing, complementing the per-bucket Option
returned by calculate_cost.

OutputTokenPricing gains reserved image/audio fields; they round-trip
via serde and surface in /v1/models JSON if upstream catalogs ever
publish them. No billing logic yet — that lands when matching usage
buckets land in LanguageModelOutputTokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat/completions filters previously derived "is this a BYOK
request?" from `target.api_key_override.is_some()` — which is
unconditionally true for cloud-routed targets that fill in the
platform credential. As a result the cloud's deduct gate was
silently skipped for every routed request, and users were billed
nothing.

Add a `BillingMode { Cloud, Byok }` enum and a `billing_mode` field
on `RoutingTarget`. Default is `Cloud` so a forgotten field still
bills (safer to over-charge than to silently leak). All four
filter sites (openai/chat, openai/responses, anthropic/messages,
google/generate_content) now read this field instead of inferring
billing from the api_key field.

All in-tree RoutingTarget constructors get an explicit
`billing_mode: BillingMode::default()` (cloud-billed). The cloud's
RegistryRoutingTable and ByokOverlay will set this explicitly in a
follow-up commit on the cloud side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-flight deduct call used to return Result<(), VerificationError>
and any Err was logged-and-swallowed by the chat/completions filters,
so a user whose balance dropped to zero between auth and settlement
still got the response for free.

Extend the PaymentGate trait to return DebitOutcome:
- Settled: the amount was deducted synchronously.
- Deferred: the gate could not deduct right now (typically insufficient
  balance) but has recorded the debt for later collection. Callers MUST
  treat this as success — the response goes back to the client.

MppState's impl always returns Settled on success (channels can't
defer — they either have funds or don't, in which case Err propagates
as before).

The cloud's HybridPaymentGate will use Deferred in the cloud-side
follow-up commit: insert a pending_debits row instead of Err on
insufficient balance, returning Deferred so the response still ships
while the gate's next verify_payment will refuse the caller until the
debt clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
An attacker who drained a long generation and then disconnected
before the upstream's Finish event landed used to pay nothing
(StreamObservation::outcome returned Err on disconnect, which
routed to on_request_failure and skipped settlement).

StreamObservation now tracks bytes streamed across TextDelta /
ReasoningDelta / ToolInputDelta and, on disconnect-without-Finish,
returns a synthesized LanguageModelUsage with output_tokens
estimated at ~4 bytes/token. This still under-bills (input tokens
aren't tracked at this layer — the prompt isn't plumbed through
StreamObservation; a follow-up should add that) but closes the
"drain output then hang up" path.

When the Finish event arrived before the disconnect, the
authoritative usage from upstream now wins over the disconnect
branch, so the response is billed in full rather than dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new DebitOutcome enum is part of the PaymentGate contract;
downstream cloud crates need to reference it when implementing
their own gate (HybridPaymentGate in bitrouter-cloud).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the fix label May 16, 2026
CI's bitrouter-api matrix builds with --no-default-features and each
payment feature in isolation. Without payments-tempo|payments-solana,
the `byok_used = matches!(target.billing_mode, BillingMode::Byok)`
sites compile out (they live inside cfg(any(payments-tempo,
payments-solana))) but the unconditional BillingMode import remained
visible — fires unused-imports, which CI treats as -D warnings.

Split BillingMode into its own cfg-gated use across all four filter
files. Verified each of the failing matrix combos (no-features,
accounts, observe, guardrails) now `cargo check`s clean, plus
payments-tempo, payments-solana, and --all-features.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FrozenArcher404 FrozenArcher404 merged commit 98ff690 into main May 16, 2026
49 checks passed
@FrozenArcher404 FrozenArcher404 deleted the fix/audit-billing-and-pricing branch May 16, 2026 19:56
@FrozenArcher404 FrozenArcher404 restored the fix/audit-billing-and-pricing branch May 18, 2026 04:01
@FrozenArcher404 FrozenArcher404 deleted the fix/audit-billing-and-pricing branch May 18, 2026 04:02
FrozenArcher404 added a commit that referenced this pull request May 18, 2026
Six fixes mapped from the v0 May-17 audit, with v0's links in the
matching code comments:

P0 — billing leaks
- **Stream-disconnect bills $0** (v0 #463-E). `UsageAccumulator` only
  saw upstream `Usage` parts, so a client that drained a long
  generation and hung up before the trailing usage chunk paid
  nothing. Now also tracks `delta_chars` from `TextDelta` /
  `ReasoningDelta`; on `ClientDisconnected` with no upstream usage
  seen, `StreamProcessor::finish` synthesises a usage with
  `completion_tokens = ceil(chars / 4)` (the OpenAI ~4-chars/token
  heuristic). Prompt-token estimate is the documented v0 gap.
  Regression test: `disconnect_before_usage_bills_estimated_output`.
- **MPP checkpoint lost-update race** (cloud #251 audit B7 / S5).
  `MppState::sign_checkpoint` previously read `last_checkpoint` and
  UPDATEd in separate statements; two concurrent debits with
  different `cumulative` values could compute deltas off the same
  `prev` and the second UPDATE would overwrite the first, leaking
  channel balance. Wrap read + compute + write in a transaction.

P1 — billing-correctness / security
- **Partial pricing silently undercharges** (v0 #463-A). `ModelPricing`
  rates are now `Option<f64>`; `calculate_charge_micro_usd` returns
  `Option<i64>` and reports `None` when any nonzero-usage bucket
  lacks a rate. `CreditCharge` / `MppCharge` emit
  `PricingUnavailable` and `Pass` — never silently bill the missing
  bucket at $0. Adds `ModelPricing::is_complete()` predicate +
  `partial(...)` constructor.
- **Adversarial token clamp** (cloud #251 audit B9). New
  `MAX_TRUSTED_TOKENS = 10_000_000` cap inside
  `calculate_charge_micro_usd`; an adversarial upstream returning
  `u64::MAX` tokens can no longer drive the charge to overflow.
- **SSRF defence for upstream `api_base`** (cloud #251 audit S3 / S4).
  New `bitrouter_sdk::url_validator::validate_upstream_url` rejects
  non-HTTP(S) schemes, metadata DNS aliases
  (`metadata.google.internal`, `instance-data.ec2.internal`), and
  literal IPs in link-local / private / carrier-grade-NAT / ULA /
  documentation ranges. Loopback (127/8, ::1, `localhost`) stays
  allowed for dev / self-hosted upstreams (Ollama, vLLM). Wired into
  `insert_byok_key` (rejects malicious BYOK rows) and into
  `config::parse_with` (rejects malicious `bitrouter.yaml` provider
  entries).

P2 — hygiene
- **Credential Debug redaction** (cloud #251 audit S9). Hand-written
  `Debug` impls on `RoutingTarget`, `ByokCredential`, and
  `ProviderConfig` redact the `api_key` / `api_key_override` fields,
  so a future `tracing::error!(?target, ...)` can't dump upstream or
  user keys into the log stream.

206 tests pass (was 193 + 13 new); clippy clean under `-D warnings`;
fmt clean; doc warning-free; `cargo package --workspace` green.

Out of scope for this pass (v1 doesn't need them or they need a
contract change beyond the audit's intent): `DebitOutcome::Deferred`
(no cloud-style deferred-debit recovery), `pending_debits` (same),
SELECT FOR UPDATE on `credit_accounts` (v1's `deduct_credits` is
already a single atomic UPDATE in a tx; safe under sqlx's serialised
writes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant