fix(billing): audit P0 contract changes — pricing Option, BillingMode, DebitOutcome#463
Merged
Merged
Conversation
calculate_cost previously unwrap_or(0.0)'d every absent rate, so a provider with only input pricing (no output_tokens.text) would bill every output token at $0 — silent undercharging. Callers had no way to distinguish "free model" from "we don't know how to bill this". calculate_cost (and the bitrouter-api mpp wrapper) now return Option<f64>. None means: at least one token bucket has nonzero usage and the matching rate is missing. The four chat/completions filter sites log pricing_unavailable and skip the deduct (response still goes out — the recommender skip-incomplete change is what should keep this from happening in practice). The spend-log observer records cost=0 and warns, since dashboard accuracy degrades but shouldn't block the request. Behaviour preserved when usage is genuinely empty (Some(0.0)) and when an unused bucket's rate happens to be missing (Some(rest)). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
is_complete returns true when no_cache + text are both set — the minimum a provider needs to be billable. Cleanest predicate for the recommender and the /v1/models cheapest-picker to refuse entries with placeholder pricing, complementing the per-bucket Option returned by calculate_cost. OutputTokenPricing gains reserved image/audio fields; they round-trip via serde and surface in /v1/models JSON if upstream catalogs ever publish them. No billing logic yet — that lands when matching usage buckets land in LanguageModelOutputTokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat/completions filters previously derived "is this a BYOK
request?" from `target.api_key_override.is_some()` — which is
unconditionally true for cloud-routed targets that fill in the
platform credential. As a result the cloud's deduct gate was
silently skipped for every routed request, and users were billed
nothing.
Add a `BillingMode { Cloud, Byok }` enum and a `billing_mode` field
on `RoutingTarget`. Default is `Cloud` so a forgotten field still
bills (safer to over-charge than to silently leak). All four
filter sites (openai/chat, openai/responses, anthropic/messages,
google/generate_content) now read this field instead of inferring
billing from the api_key field.
All in-tree RoutingTarget constructors get an explicit
`billing_mode: BillingMode::default()` (cloud-billed). The cloud's
RegistryRoutingTable and ByokOverlay will set this explicitly in a
follow-up commit on the cloud side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-flight deduct call used to return Result<(), VerificationError> and any Err was logged-and-swallowed by the chat/completions filters, so a user whose balance dropped to zero between auth and settlement still got the response for free. Extend the PaymentGate trait to return DebitOutcome: - Settled: the amount was deducted synchronously. - Deferred: the gate could not deduct right now (typically insufficient balance) but has recorded the debt for later collection. Callers MUST treat this as success — the response goes back to the client. MppState's impl always returns Settled on success (channels can't defer — they either have funds or don't, in which case Err propagates as before). The cloud's HybridPaymentGate will use Deferred in the cloud-side follow-up commit: insert a pending_debits row instead of Err on insufficient balance, returning Deferred so the response still ships while the gate's next verify_payment will refuse the caller until the debt clears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
An attacker who drained a long generation and then disconnected before the upstream's Finish event landed used to pay nothing (StreamObservation::outcome returned Err on disconnect, which routed to on_request_failure and skipped settlement). StreamObservation now tracks bytes streamed across TextDelta / ReasoningDelta / ToolInputDelta and, on disconnect-without-Finish, returns a synthesized LanguageModelUsage with output_tokens estimated at ~4 bytes/token. This still under-bills (input tokens aren't tracked at this layer — the prompt isn't plumbed through StreamObservation; a follow-up should add that) but closes the "drain output then hang up" path. When the Finish event arrived before the disconnect, the authoritative usage from upstream now wins over the disconnect branch, so the response is billed in full rather than dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new DebitOutcome enum is part of the PaymentGate contract; downstream cloud crates need to reference it when implementing their own gate (HybridPaymentGate in bitrouter-cloud). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's bitrouter-api matrix builds with --no-default-features and each payment feature in isolation. Without payments-tempo|payments-solana, the `byok_used = matches!(target.billing_mode, BillingMode::Byok)` sites compile out (they live inside cfg(any(payments-tempo, payments-solana))) but the unconditional BillingMode import remained visible — fires unused-imports, which CI treats as -D warnings. Split BillingMode into its own cfg-gated use across all four filter files. Verified each of the failing matrix combos (no-features, accounts, observe, guardrails) now `cargo check`s clean, plus payments-tempo, payments-solana, and --all-features. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FrozenArcher404
added a commit
that referenced
this pull request
May 18, 2026
Six fixes mapped from the v0 May-17 audit, with v0's links in the matching code comments: P0 — billing leaks - **Stream-disconnect bills $0** (v0 #463-E). `UsageAccumulator` only saw upstream `Usage` parts, so a client that drained a long generation and hung up before the trailing usage chunk paid nothing. Now also tracks `delta_chars` from `TextDelta` / `ReasoningDelta`; on `ClientDisconnected` with no upstream usage seen, `StreamProcessor::finish` synthesises a usage with `completion_tokens = ceil(chars / 4)` (the OpenAI ~4-chars/token heuristic). Prompt-token estimate is the documented v0 gap. Regression test: `disconnect_before_usage_bills_estimated_output`. - **MPP checkpoint lost-update race** (cloud #251 audit B7 / S5). `MppState::sign_checkpoint` previously read `last_checkpoint` and UPDATEd in separate statements; two concurrent debits with different `cumulative` values could compute deltas off the same `prev` and the second UPDATE would overwrite the first, leaking channel balance. Wrap read + compute + write in a transaction. P1 — billing-correctness / security - **Partial pricing silently undercharges** (v0 #463-A). `ModelPricing` rates are now `Option<f64>`; `calculate_charge_micro_usd` returns `Option<i64>` and reports `None` when any nonzero-usage bucket lacks a rate. `CreditCharge` / `MppCharge` emit `PricingUnavailable` and `Pass` — never silently bill the missing bucket at $0. Adds `ModelPricing::is_complete()` predicate + `partial(...)` constructor. - **Adversarial token clamp** (cloud #251 audit B9). New `MAX_TRUSTED_TOKENS = 10_000_000` cap inside `calculate_charge_micro_usd`; an adversarial upstream returning `u64::MAX` tokens can no longer drive the charge to overflow. - **SSRF defence for upstream `api_base`** (cloud #251 audit S3 / S4). New `bitrouter_sdk::url_validator::validate_upstream_url` rejects non-HTTP(S) schemes, metadata DNS aliases (`metadata.google.internal`, `instance-data.ec2.internal`), and literal IPs in link-local / private / carrier-grade-NAT / ULA / documentation ranges. Loopback (127/8, ::1, `localhost`) stays allowed for dev / self-hosted upstreams (Ollama, vLLM). Wired into `insert_byok_key` (rejects malicious BYOK rows) and into `config::parse_with` (rejects malicious `bitrouter.yaml` provider entries). P2 — hygiene - **Credential Debug redaction** (cloud #251 audit S9). Hand-written `Debug` impls on `RoutingTarget`, `ByokCredential`, and `ProviderConfig` redact the `api_key` / `api_key_override` fields, so a future `tracing::error!(?target, ...)` can't dump upstream or user keys into the log stream. 206 tests pass (was 193 + 13 new); clippy clean under `-D warnings`; fmt clean; doc warning-free; `cargo package --workspace` green. Out of scope for this pass (v1 doesn't need them or they need a contract change beyond the audit's intent): `DebitOutcome::Deferred` (no cloud-style deferred-debit recovery), `pending_debits` (same), SELECT FOR UPDATE on `credit_accounts` (v1's `deduct_credits` is already a single atomic UPDATE in a tx; safe under sqlx's serialised writes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Six commits supporting a follow-up bitrouter-cloud audit-fix PR. All semver-affecting changes are in this PR so cloud's dep bump can land cleanly after the next release.
What's in here
fix(pricing): return Option when a required rate is missing(dedf41c)bitrouter_core::pricing::calculate_costand thebitrouter_api::mpp::calculate_usage_costwrapper now returnOption<f64>. The previousunwrap_or(0.0)short-circuit silently undercharged whenever a token bucket had nonzero usage but no rate (a provider with input pricing andoutput_tokens: {}placeholder, for instance, billed every output token at $0). The four/v1/chat/completions-shaped filter sites logpricing_unavailableand skip the deduct whenNonecomes back — the recommender's skip-incomplete change on the cloud side is what should keep that from happening in practice.feat(pricing): add ModelPricing::is_complete and reserved output buckets(95b64cf)ModelPricing::is_complete()is the "safe to bill" predicate the cheapest-picker / recommender call to refuse placeholder entries. Adds reservedOutputTokenPricing.image/.audiofields (serde-round-tripped) so multimodal billing has a place to land without another breaking change.feat(billing): explicit BillingMode on RoutingTarget(1b68281)The filters previously inferred "is this a BYOK request?" from
target.api_key_override.is_some(). Cloud-routed targets always carry a platform key, so that inference treated every cloud request as BYOK and the deduct gate was unconditionally skipped. NewBillingMode { Cloud, Byok }field onRoutingTarget; defaults toCloudso a forgotten field bills (safer than silently leaking). All four filters now read this field. All in-treeRoutingTargetconstructors get an explicitbilling_mode: BillingMode::default().feat(billing): deduct returns DebitOutcome (Settled | Deferred)(65efb60)Extends the
PaymentGate::deductcontract.Deferredlets a gate signal "couldn't draw right now but recorded the debt for later collection" — callers MUST treat it as success, the response goes back unchanged. The credits-backed cloud gate uses this to insert apending_debitsrow instead of failing the response.MppState's impl maps its existing success path toSettled(channels can't defer).feat(mpp): export DebitOutcome from the public mpp module(01ee02c)One-liner to make the new enum reachable from downstream cloud crates.
fix(stream): bill estimated output on mid-stream client disconnect(720bd77)StreamObservation::outcomeused to returnErrwhenever the client disconnected mid-stream, routing toon_request_failurewith no billing — an attacker who drained a long generation and hung up before the Finish event paid nothing. Now: when Finish landed first, settle with its authoritative usage even if disconnect happened after; when disconnect beats Finish, return a synthesized usage with output tokens estimated fromdelta.len() / 4so the request bills something. Input billing on disconnect is a known gap (prompt not plumbed throughStreamObservation).Compat notes
Three of these are technically API-breaking and want a minor bump:
RoutingTargetgains abilling_modefield — any downstream literal constructor needs to add it. Cushioned byBillingMode::default()returningCloud.calculate_cost/calculate_usage_costnow returnOption<f64>— callers must handleNone. Only known consumer isbitrouter-cloud.PaymentGate::deductreturnsResult<DebitOutcome, _>— trait impls update.bitrouter-observe::ModelSpendObserveris updated in-tree to keep working with the newOptionreturn (logspricing_unavailable, writescost=0to the spend log rather than crashing).Test plan
cargo nextest run— 968/968 passcargo clippy --all-targets -- -D warnings— cleancargo fmt -- --check— cleanbitrouter-core::pricing::tests(Option / partial pricing / unused-bucket cases)bitrouter-api::router::stream_observation_tests(disconnect after Finish, disconnect with deltas, disconnect with no output)🤖 Generated with Claude Code