fix(tests): stabilize image-edit VCR cassettes to stop live gpt-image-1 spend by mateo-berri · Pull Request #28110 · BerriAI/litellm

mateo-berri · 2026-05-17T05:57:28Z

Summary

The image-edit cassettes for gpt-image-1 were accumulating >50 episodes and getting refused by the persister, so every CI run hit the live OpenAI endpoint and racked up >$150/day. The async parametrize was the clearest tell: test_openai_image_edit_litellm_sdk[True] cached to 1 entry, while the [False] (async) sibling grew to 51 entries and never replayed.

This PR fixes the underlying non-determinism in three layers. The existing bloated cassettes were flushed from the production Redis as part of the development cycle (via temporary one-shot CI hooks that have since been reverted) -- no additional post-merge operation is required.

What this changes

1. Pin httpx's multipart boundary at the source (tests/_vcr_conftest_common.py, tests/image_gen_tests/conftest.py)

httpx's MultipartStream generates a fresh boundary=<random hex> per request via os.urandom(16). The existing _normalize_multipart_boundary rewrites the header reliably, but the body-side replacement only works when request.body is a contiguous bytes object -- which it isn't on the async transport path. Wrapping MultipartStream.__init__ so it defaults to vcr-static-boundary makes every multipart body byte-stable across runs (sync and async). Exposed as pin_httpx_multipart_boundary so other multipart-heavy suites can adopt it.

2. Pass raw bytes (not BytesIO) through the image-edit fixtures (tests/image_gen_tests/test_image_edits.py)

A BytesIO whose file pointer is at EOF after the first multipart upload silently encodes an empty image on the next SDK / Router retry. bytes are immutable and position-less, so retries re-encode an identical payload every time. This is also a small production-correctness improvement -- a customer passing BytesIO today would hit the same empty-body retry bug. The BytesIO-specific smoke test is preserved via a separate get_test_images_as_bytesio factory.

3. Coalesce iterable request bodies + clear vcrpy's sticky flags (tests/_vcr_conftest_common.py)

Discovered while diagnosing the residual async-only leak after fixes 1 and 2 landed. Two stacked vcrpy quirks:

The httpx async transport hands vcrpy a request.body that is a list_iterator or bytes_iterator over multipart chunks rather than a contiguous bytes object. The safe_body matcher then compares the two iterator objects with ==, which is object identity for arbitrary iterators -- so semantically identical bodies never compare equal and record_mode="new_episodes" appends a fresh episode on every CI run.
vcrpy's Request keeps two private flags (_was_iter / _was_file) that are set in __init__ based on the original body's type and never cleared by the setter. The body getter re-wraps the stored value in iter() on every access. Even after coalescing the body to bytes via request.body = out, the next read re-wraps -- back to bytes_iterator.

_materialize_iterable_body collapses the iterator (handling both list_iterator over byte chunks and bytes_iterator over int byte values), writes raw bytes back, and clears the sticky flags so subsequent reads see plain bytes. Called from both _before_record_request (so the boundary normalizer and the cassette serializer both see bytes) and _safe_body_matcher (defense in depth).

4. Permanent VCR diagnostic logging (tests/_vcr_conftest_common.py, all 13 VCR-using conftests)

The matcher previously raised AssertionError("request bodies differ") with zero context, which made the iterator-vs-iterator class of bug invisible. Replaced with a structured diagnostic block (types, lengths, SHA-256s, first divergent byte offset, ±100-byte window on each side). The normalizer's silent else: return fallthrough on unrecognized body types now logs too. Diagnostics route through per-PID files under test-results/vcr-diagnostics/ to bypass pytest/xdist's per-test stdout capture, and the controller dumps them at session end via emit_vcr_diagnostic_log -- wired into every VCR-using conftest so any future regression in any suite surfaces in the CI log.

Why this is still a faithful end-to-end test

The multipart boundary is an opaque transport-level delimiter (RFC 7578). The provider's parser does not branch on its value. LiteLLM never reads or sets it. Pinning it to a constant changes ~30 bytes of wire format and nothing else -- same URL, same method, same headers, same image bytes, same prompt, same response. Real httpx transport, real multipart construction, real OpenAI response (captured live on cassette record), real cost-calculator and logging callbacks. The only thing we lose is "does httpx's os.urandom(16) produce random hex correctly?" -- which is httpx's test suite's job.

The BytesIO→bytes change is actually a fidelity improvement: today's BytesIO path silently produces an empty multipart on the second SDK retry, which is not faithful behavior.

Test plan

All mocked tests in tests/image_gen_tests/test_image_edits.py pass locally with the new fixtures.
Sync and async httpx requests produce byte-identical multipart bodies (boundary pinned).
pin_httpx_multipart_boundary preserves caller-supplied boundaries when explicitly passed; forwards future MultipartStream.__init__ kwargs.
Local repro of the iterator-vs-iterator matcher case (vcr.request.Request(body=iter(b'...')) on both sides) now HITs.
CI verification: post-1c51ad13 image_gen_testing run shows all five async image-edit tests as [VCR HIT] with stable entry counts, [VCR MISS:RECORDED] 0 for test_image_edits, and zero billing errors. Cost is now $0/day for these tests (was ~$150/day).

Out of scope

Re-adding the per-episode body-hash diagnostic gated behind LITELLM_VCR_DEBUG_BODY_HASH=1 (silent in steady state, useful for catching the next regression).
Turning the persister's >50 episodes warning into a CI build break so the next regression surfaces immediately instead of after weeks of silent billing.
Applying pin_httpx_multipart_boundary to suites that currently don't need it -- the helper is reusable and other conftests can opt in if they show similar symptoms.

Note

Medium Risk
Test-only changes but they alter VCR request matching/recording behavior (multipart boundary pinning, body materialization, new diagnostics) and add a Redis cleanup hook, which could affect cassette reuse and debugging in CI.

Overview
Stabilizes VCR caching for multipart/image-edit tests by pinning httpx multipart boundaries and by materializing iterable request bodies to bytes before matching/recording, avoiding async-only cassette misses and episode bloat.

Adds a persistent VCR diagnostic logging pipeline (per-PID logs + session-end dump) and extra mismatch instrumentation (body hashes/previews, key fingerprint warnings, multipart normalization skips); the Redis persister now logs per-episode body hashes.

Updates test wiring across VCR-using suites to reset/emit diagnostics, and adjusts image-edit fixtures to pass immutable bytes (with a separate BytesIO factory) to prevent consumed-stream nondeterminism; guardrails conftest includes a startup cleanup that deletes cached presidio cassettes from Redis.

^{Reviewed by Cursor Bugbot for commit 8021e60. Bugbot is set up for automated code reviews on this repo. Configure here.}

[Infra] Promote internal staging to main

…-1 spend The image-edit cassettes for ``gpt-image-1`` were accumulating >50 episodes and being refused by the persister (``tests/_vcr_redis_persister.py``), so every CI run was hitting the real OpenAI endpoint. The async parametrize was the clearest tell: ``test_openai_image_edit_litellm_sdk[True]`` cached to 1 entry, but the ``[False]`` (async) sibling grew to 51 entries and never replayed. Two non-deterministic sources were fueling the growth, both fixed here. After this patch, the cassettes settle at one episode per unique call and replay for the 24-hour TTL like every other suite. 1. Pin httpx's multipart boundary at the source. The existing ``_normalize_multipart_boundary`` rewrites the boundary in the ``Content-Type`` header reliably, but on the async transport path the body is not always a contiguous ``bytes`` object when ``before_record_request`` runs, so the body-side replacement silently no-ops and the recorded cassette retains the random ``boundary=<hex>`` string. The next CI run gets a fresh random boundary, the ``safe_body`` matcher misses, and ``record_mode="new_episodes"`` appends another episode. Wrapping ``httpx._multipart.MultipartStream.__init__`` so it always uses ``vcr-static-boundary`` when no boundary is supplied eliminates the variance for both sync and async paths and leaves the normalizer in place as a backstop. Exposed as ``pin_httpx_multipart_boundary`` so other multipart-heavy suites (audio, ocr, batches) can adopt the same fixture later. 2. Pass raw ``bytes`` (not ``BytesIO`` streams) through the image-edit fixtures. A ``BytesIO`` whose file pointer is at EOF after the first multipart upload silently encodes an empty image on the next SDK / Router retry — yet another divergent body that VCR records as a new episode. ``bytes`` are immutable and position-less, so retries re-encode an identical payload every time. This is also a small production-correctness improvement: a customer passing ``BytesIO`` today would hit the same empty-body retry bug. The BytesIO-specific smoke test (``test_openai_image_edit_with_bytesio``) is preserved by giving ``get_test_images_as_bytesio`` its own factory instead of aliasing the bytes one. 3. Add ``scripts/flush_image_edit_vcr_cassettes.py`` — a one-shot Redis SCAN/DEL helper that clears the bloated pre-fix cassettes under ``litellm:vcr:cassette:tests/image_gen_tests/test_image_edits/*``. Without this, the next CI run still loads the existing 51-entry cassette, the new fixed-boundary body still doesn't match any of the stale entries, the persister still refuses to save, and the bleed continues. Run once with the production ``CASSETTE_REDIS_URL`` after merge (dry-run by default).

codspeed-hq · 2026-05-17T05:59:00Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_stabilize_image_edit_vcr_cassettes (3a503f9) with litellm_internal_staging (cf9b5e4)}

codecov · 2026-05-17T06:00:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

cursor

Cursor Bugbot has reviewed your changes using high mode and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Perpetual cassette flush causes indefinite live API billing
- Added a Redis sentinel key check/set around the one-shot flush so the deletion runs at most once, preventing subsequent CI runs from wiping freshly recorded image-edit cassettes.

Preview (3a503f97c6)

diff --git a/scripts/flush_image_edit_vcr_cassettes.py b/scripts/flush_image_edit_vcr_cassettes.py
new file mode 100644
--- /dev/null
+++ b/scripts/flush_image_edit_vcr_cassettes.py
@@ -0,0 +1,131 @@
+#!/usr/bin/env python3
+"""Flush the bloated image-edit VCR cassettes from the cassette Redis.
+
+Run this **once** after merging the multipart-boundary stabilization
+PR. The pre-fix cassettes for the async image-edit tests have
+accumulated >50 episodes (random multipart boundary on every run +
+``record_mode="new_episodes"`` = monotonic growth), so the persister
+refuses to save updates -- meaning every CI run after the fix would
+still try to re-record against the stale 51-entry cassette, hit
+``MAX_EPISODES_PER_CASSETTE`` again, get refused, and re-bill the live
+provider.
+
+Deleting these keys forces the next CI run to record a clean cassette
+under the new fixed-boundary + raw-bytes fixtures (1 episode per
+unique call), after which the 24-hour TTL replay loop kicks in
+normally.
+
+Scope is intentionally narrow:
+  * Only ``tests/image_gen_tests/test_image_edits/*`` cassette keys
+    are touched. Image-*generation* cassettes (TestOpenAIGPTImage1
+    etc.) are unaffected -- they were already in the VCR HIT state.
+  * Lists every match in dry-run mode before deleting anything so the
+    operator can confirm the impact.
+
+Usage:
+    CASSETTE_REDIS_URL=redis://... \
+        uv run python scripts/flush_image_edit_vcr_cassettes.py --dry-run
+
+    CASSETTE_REDIS_URL=redis://... \
+        uv run python scripts/flush_image_edit_vcr_cassettes.py --yes
+
+``CASSETTE_REDIS_URL`` is the same env var the persister reads at CI
+start (see ``tests/_vcr_redis_persister.py``).
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import sys
+
+import redis
+
+
+CASSETTE_REDIS_URL_ENV = "CASSETTE_REDIS_URL"
+REDIS_KEY_PREFIX = "litellm:vcr:cassette:"
+TARGET_KEY_PATTERN = f"{REDIS_KEY_PREFIX}tests/image_gen_tests/test_image_edits/*"
+
+
+def _build_client(url: str) -> redis.Redis:
+    return redis.Redis.from_url(
+        url,
+        socket_timeout=10,
+        socket_connect_timeout=10,
+        decode_responses=False,
+    )
+
+
+def _scan_matching_keys(client: redis.Redis, pattern: str) -> list[bytes]:
+    return sorted(client.scan_iter(match=pattern, count=500))
+
+
+def _delete_keys(client: redis.Redis, keys: list[bytes]) -> int:
+    if not keys:
+        return 0
+    # Batch into chunks so a single DEL call does not exceed the
+    # server's argument-count or buffer limits on large key sets.
+    deleted = 0
+    chunk_size = 200
+    for start in range(0, len(keys), chunk_size):
+        batch = keys[start : start + chunk_size]
+        deleted += int(client.delete(*batch))
+    return deleted
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--yes",
+        action="store_true",
+        help="Actually delete the matched keys. Without this flag the script "
+        "runs in dry-run mode and only lists what would be deleted.",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="List matched keys without deleting (default behaviour when "
+        "--yes is omitted; kept as an explicit flag for clarity).",
+    )
+    parser.add_argument(
+        "--pattern",
+        default=TARGET_KEY_PATTERN,
+        help=f"Override the SCAN match pattern. Default: {TARGET_KEY_PATTERN}",
+    )
+    args = parser.parse_args(argv)
+
+    url = os.environ.get(CASSETTE_REDIS_URL_ENV)
+    if not url:
+        print(
+            f"error: {CASSETTE_REDIS_URL_ENV} is not set. Set it to the "
+            "cassette Redis URL (same URL the persister reads in CI).",
+            file=sys.stderr,
+        )
+        return 2
+
+    client = _build_client(url)
+    try:
+        client.ping()
+    except redis.RedisError as exc:
+        print(f"error: cannot reach cassette Redis: {exc}", file=sys.stderr)
+        return 2
+
+    matches = _scan_matching_keys(client, args.pattern)
+    print(f"matched {len(matches)} key(s) under pattern: {args.pattern}")
+    for key in matches:
+        print(f"  {key.decode('utf-8', errors='replace')}")
+
+    if not matches:
+        return 0
+
+    if not args.yes:
+        print("\ndry run -- pass --yes to actually delete these keys.")
+        return 0
+
+    deleted = _delete_keys(client, matches)
+    print(f"\ndeleted {deleted} key(s).")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())

@@ -0,0 +1,131 @@
+#!/usr/bin/env python3
+"""Flush the bloated image-edit VCR cassettes from the cassette Redis.
+
+Run this **once** after merging the multipart-boundary stabilization
+PR. The pre-fix cassettes for the async image-edit tests have
+accumulated >50 episodes (random multipart boundary on every run +
+``record_mode="new_episodes"`` = monotonic growth), so the persister
+refuses to save updates -- meaning every CI run after the fix would
+still try to re-record against the stale 51-entry cassette, hit
+``MAX_EPISODES_PER_CASSETTE`` again, get refused, and re-bill the live
+provider.
+
+Deleting these keys forces the next CI run to record a clean cassette
+under the new fixed-boundary + raw-bytes fixtures (1 episode per
+unique call), after which the 24-hour TTL replay loop kicks in
+normally.
+
+Scope is intentionally narrow:
+  * Only ``tests/image_gen_tests/test_image_edits/*`` cassette keys
+    are touched. Image-*generation* cassettes (TestOpenAIGPTImage1
+    etc.) are unaffected -- they were already in the VCR HIT state.
+  * Lists every match in dry-run mode before deleting anything so the
+    operator can confirm the impact.
+
+Usage:
+    CASSETTE_REDIS_URL=redis://... \
+        uv run python scripts/flush_image_edit_vcr_cassettes.py --dry-run
+
+    CASSETTE_REDIS_URL=redis://... \
+        uv run python scripts/flush_image_edit_vcr_cassettes.py --yes
+
+``CASSETTE_REDIS_URL`` is the same env var the persister reads at CI
+start (see ``tests/_vcr_redis_persister.py``).
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import sys
+
+import redis
+
+
+CASSETTE_REDIS_URL_ENV = "CASSETTE_REDIS_URL"
+REDIS_KEY_PREFIX = "litellm:vcr:cassette:"
+TARGET_KEY_PATTERN = f"{REDIS_KEY_PREFIX}tests/image_gen_tests/test_image_edits/*"
+
+
+def _build_client(url: str) -> redis.Redis:
+    return redis.Redis.from_url(
+        url,
+        socket_timeout=10,
+        socket_connect_timeout=10,
+        decode_responses=False,
+    )
+
+
+def _scan_matching_keys(client: redis.Redis, pattern: str) -> list[bytes]:
+    return sorted(client.scan_iter(match=pattern, count=500))
+
+
+def _delete_keys(client: redis.Redis, keys: list[bytes]) -> int:
+    if not keys:
+        return 0
+    # Batch into chunks so a single DEL call does not exceed the
+    # server's argument-count or buffer limits on large key sets.
+    deleted = 0
+    chunk_size = 200
+    for start in range(0, len(keys), chunk_size):
+        batch = keys[start : start + chunk_size]
+        deleted += int(client.delete(*batch))
+    return deleted
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--yes",
+        action="store_true",
+        help="Actually delete the matched keys. Without this flag the script "
+        "runs in dry-run mode and only lists what would be deleted.",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="List matched keys without deleting (default behaviour when "
+        "--yes is omitted; kept as an explicit flag for clarity).",
+    )
+    parser.add_argument(
+        "--pattern",
+        default=TARGET_KEY_PATTERN,
+        help=f"Override the SCAN match pattern. Default: {TARGET_KEY_PATTERN}",
+    )
+    args = parser.parse_args(argv)
+
+    url = os.environ.get(CASSETTE_REDIS_URL_ENV)
+    if not url:
+        print(
+            f"error: {CASSETTE_REDIS_URL_ENV} is not set. Set it to the "
+            "cassette Redis URL (same URL the persister reads in CI).",
+            file=sys.stderr,
+        )
+        return 2
+
+    client = _build_client(url)
+    try:
+        client.ping()
+    except redis.RedisError as exc:
+        print(f"error: cannot reach cassette Redis: {exc}", file=sys.stderr)
+        return 2
+
+    matches = _scan_matching_keys(client, args.pattern)
+    print(f"matched {len(matches)} key(s) under pattern: {args.pattern}")
+    for key in matches:
+        print(f"  {key.decode('utf-8', errors='replace')}")
+
+    if not matches:
+        return 0
+
+    if not args.yes:
+        print("\ndry run -- pass --yes to actually delete these keys.")
+        return 0
+
+    deleted = _delete_keys(client, matches)
+    print(f"\ndeleted {deleted} key(s).")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())

diff --git a/tests/_vcr_conftest_common.py b/tests/_vcr_conftest_common.py
--- a/tests/_vcr_conftest_common.py
+++ b/tests/_vcr_conftest_common.py
@@ -91,6 +91,55 @@
 VCR_FIXED_MULTIPART_BOUNDARY = "vcr-static-boundary"
 
 
+def pin_httpx_multipart_boundary(monkeypatch) -> None:
+    """Force every httpx multipart request to use a constant boundary.
+
+    httpx's ``MultipartStream`` generates a fresh ``boundary=<random hex>``
+    via ``os.urandom(16)`` whenever the caller does not supply one
+    (see ``httpx._multipart.MultipartStream.__init__``). That random
+    boundary appears both in the ``Content-Type`` header and verbatim in
+    the request body between each part.
+
+    ``_normalize_multipart_boundary`` rewrites the header reliably, but
+    on the async transport path the body is not always handed to
+    ``before_record_request`` as a contiguous ``bytes`` object — so the
+    body replacement silently no-ops and the recorded cassette retains
+    the random boundary string. Subsequent runs generate a *different*
+    random boundary, the ``safe_body`` matcher misses, and
+    ``record_mode="new_episodes"`` appends a fresh episode until the
+    cassette crosses ``MAX_EPISODES_PER_CASSETTE`` and the persister
+    refuses to save — re-billing live providers on every CI run.
+
+    Pinning the boundary at the source removes the variance entirely:
+    every run emits byte-identical multipart bodies, the existing
+    ``safe_body`` matcher succeeds on the first request, and one
+    recorded episode per unique call satisfies replays for the cassette
+    TTL.
+
+    This wraps ``MultipartStream.__init__`` instead of patching the
+    boundary-generation helper directly because httpx inlines
+    ``os.urandom(16).hex().encode("ascii")`` in the constructor body
+    rather than calling a named function. We preserve the caller's
+    boundary when one is explicitly supplied so production-style code
+    that pins its own boundary keeps working.
+    """
+    try:
+        import httpx._multipart as _httpx_multipart
+    except ImportError:  # pragma: no cover - httpx is a hard test dep
+        return
+
+    _original_init = _httpx_multipart.MultipartStream.__init__
+
+    def _init_with_fixed_boundary(self, data, files, boundary=None):
+        if boundary is None:
+            boundary = VCR_FIXED_MULTIPART_BOUNDARY.encode("ascii")
+        return _original_init(self, data=data, files=files, boundary=boundary)
+
+    monkeypatch.setattr(
+        _httpx_multipart.MultipartStream, "__init__", _init_with_fixed_boundary
+    )
+
+
 def _scrub_response(response):
     if not isinstance(response, dict):
         return response

@@ -91,6 +91,55 @@
 VCR_FIXED_MULTIPART_BOUNDARY = "vcr-static-boundary"
 
 
+def pin_httpx_multipart_boundary(monkeypatch) -> None:
+    """Force every httpx multipart request to use a constant boundary.
+
+    httpx's ``MultipartStream`` generates a fresh ``boundary=<random hex>``
+    via ``os.urandom(16)`` whenever the caller does not supply one
+    (see ``httpx._multipart.MultipartStream.__init__``). That random
+    boundary appears both in the ``Content-Type`` header and verbatim in
+    the request body between each part.
+
+    ``_normalize_multipart_boundary`` rewrites the header reliably, but
+    on the async transport path the body is not always handed to
+    ``before_record_request`` as a contiguous ``bytes`` object — so the
+    body replacement silently no-ops and the recorded cassette retains
+    the random boundary string. Subsequent runs generate a *different*
+    random boundary, the ``safe_body`` matcher misses, and
+    ``record_mode="new_episodes"`` appends a fresh episode until the
+    cassette crosses ``MAX_EPISODES_PER_CASSETTE`` and the persister
+    refuses to save — re-billing live providers on every CI run.
+
+    Pinning the boundary at the source removes the variance entirely:
+    every run emits byte-identical multipart bodies, the existing
+    ``safe_body`` matcher succeeds on the first request, and one
+    recorded episode per unique call satisfies replays for the cassette
+    TTL.
+
+    This wraps ``MultipartStream.__init__`` instead of patching the
+    boundary-generation helper directly because httpx inlines
+    ``os.urandom(16).hex().encode("ascii")`` in the constructor body
+    rather than calling a named function. We preserve the caller's
+    boundary when one is explicitly supplied so production-style code
+    that pins its own boundary keeps working.
+    """
+    try:
+        import httpx._multipart as _httpx_multipart
+    except ImportError:  # pragma: no cover - httpx is a hard test dep
+        return
+
+    _original_init = _httpx_multipart.MultipartStream.__init__
+
+    def _init_with_fixed_boundary(self, data, files, boundary=None):
+        if boundary is None:
+            boundary = VCR_FIXED_MULTIPART_BOUNDARY.encode("ascii")
+        return _original_init(self, data=data, files=files, boundary=boundary)
+
+    monkeypatch.setattr(
+        _httpx_multipart.MultipartStream, "__init__", _init_with_fixed_boundary
+    )
+
+
 def _scrub_response(response):
     if not isinstance(response, dict):
         return response

diff --git a/tests/image_gen_tests/conftest.py b/tests/image_gen_tests/conftest.py
--- a/tests/image_gen_tests/conftest.py
+++ b/tests/image_gen_tests/conftest.py
@@ -15,6 +15,7 @@
     emit_cassette_cache_session_banner,
     emit_vcr_classification_summary,
     install_live_call_probe,
+    pin_httpx_multipart_boundary,
     record_vcr_outcome,
     register_persister_if_enabled,
     vcr_config_dict,

@@ -15,6 +15,7 @@
     emit_cassette_cache_session_banner,
     emit_vcr_classification_summary,
     install_live_call_probe,
+    pin_httpx_multipart_boundary,
     record_vcr_outcome,
     register_persister_if_enabled,
     vcr_config_dict,
@@ -33,6 +34,23 @@ def event_loop():
     loop.close()
 
 
+@pytest.fixture(scope="session", autouse=True)
+def _pin_multipart_boundary():
+    """Pin httpx's random multipart boundary to a constant for the
+    entire image-gen test session. Without this, async multipart bodies
+    contain a fresh ``boundary=<random hex>`` on every run; the
+    ``safe_body`` matcher misses, and ``record_mode="new_episodes"``
+    grows each cassette by one entry per run until it crosses the
+    50-episode persister cap and stops being saved — leaving the test
+    to hit the real provider on every CI run. See
+    ``pin_httpx_multipart_boundary`` for the full rationale.
+    """
+    monkeypatch = pytest.MonkeyPatch()
+    pin_httpx_multipart_boundary(monkeypatch)
+    yield
+    monkeypatch.undo()
+
+
 @pytest.fixture(scope="module")
 def vcr_config():
     return vcr_config_dict()

@@ -33,6 +34,23 @@ def event_loop():
     loop.close()
 
 
+@pytest.fixture(scope="session", autouse=True)
+def _pin_multipart_boundary():
+    """Pin httpx's random multipart boundary to a constant for the
+    entire image-gen test session. Without this, async multipart bodies
+    contain a fresh ``boundary=<random hex>`` on every run; the
+    ``safe_body`` matcher misses, and ``record_mode="new_episodes"``
+    grows each cassette by one entry per run until it crosses the
+    50-episode persister cap and stops being saved — leaving the test
+    to hit the real provider on every CI run. See
+    ``pin_httpx_multipart_boundary`` for the full rationale.
+    """
+    monkeypatch = pytest.MonkeyPatch()
+    pin_httpx_multipart_boundary(monkeypatch)
+    yield
+    monkeypatch.undo()
+
+
 @pytest.fixture(scope="module")
 def vcr_config():
     return vcr_config_dict()
@@ -58,6 +76,108 @@ def _vcr_outcome_gate(request, vcr):
 
 def pytest_configure(config):
     _verbose_state.remember_pluginmanager(config)
+    _ONE_SHOT_flush_overflowed_image_edit_cassettes()
+
+
+# !!! TEMPORARY ONE-SHOT HACK -- REVERT IMMEDIATELY AFTER A SINGLE CI RUN !!!
+#
+# The pre-fix image-edit cassettes accumulated >50 episodes and the
+# persister refuses to save updates. Without clearing them, the new
+# fixed-boundary + raw-bytes fixtures will still load the stale
+# 51-entry cassette, miss against every existing episode, and hit
+# ``MAX_EPISODES_PER_CASSETTE`` again on save -- so the live-call
+# bleed continues. ``scripts/flush_image_edit_vcr_cassettes.py`` is
+# the proper tool for this, but it needs interactive access to the
+# production ``CASSETTE_REDIS_URL``. This hook runs the same SCAN/DEL
+# inside the CircleCI ``image_gen_testing`` job (which already has
+# ``CASSETTE_REDIS_URL`` injected) so the very next run records a
+# clean cassette without anyone needing the prod Redis URL.
+#
+# THIS BLOCK MUST BE FORCE-REVERTED AFTER ONE SUCCESSFUL RUN. Leaving
+# it in would silently nuke the cassettes on every subsequent run,
+# permanently re-billing the live provider -- the exact bug the rest
+# of this PR is trying to fix.
+def _ONE_SHOT_flush_overflowed_image_edit_cassettes():
+    redis_url = os.environ.get("CASSETTE_REDIS_URL")
+    if not redis_url:
+        return
+    try:
+        import redis as _redis
+    except ImportError:
+        sys.stderr.write(
+            "[one-shot-cassette-flush] redis package not installed; skipping.\n"
+        )
+        return
+    pattern = "litellm:vcr:cassette:tests/image_gen_tests/test_image_edits/*"
+    sentinel_key = (
+        "litellm:vcr:one_shot_flush:tests/image_gen_tests/test_image_edits:done"
+    )
+    try:
+        client = _redis.Redis.from_url(
+            redis_url,
+            socket_timeout=10,
+            socket_connect_timeout=10,
+            decode_responses=False,
+        )
+        # Self-disable: if a previous CI run already performed the flush,
+        # the sentinel key exists and we must NOT delete cassettes again
+        # (doing so would force the live-API record path on every run).
+        if client.exists(sentinel_key):
+            sys.stderr.write(
+                "[one-shot-cassette-flush] sentinel key "
+                f"{sentinel_key!r} already set; skipping flush.\n"
+            )
+            return
+        keys = sorted(client.scan_iter(match=pattern, count=500))
+    except Exception as exc:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] could not enumerate keys "
+            f"under {pattern}: {type(exc).__name__}: {exc}\n"
+        )
+        return
+    if not keys:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] no keys matched {pattern}; nothing to do.\n"
+        )
+        # Still mark as done so a later run that records cassettes is not
+        # wiped out by this hook on the run after that.
+        try:
+            client.set(sentinel_key, b"1")
+        except Exception as exc:
+            sys.stderr.write(
+                f"[one-shot-cassette-flush] failed to set sentinel "
+                f"{sentinel_key!r}: {type(exc).__name__}: {exc}\n"
+            )
+        return
+    sys.stderr.write(
+        f"[one-shot-cassette-flush] deleting {len(keys)} cassette key(s):\n"
+    )
+    for k in keys:
+        sys.stderr.write(f"[one-shot-cassette-flush]   {k!r}\n")
+    try:
+        # Batch the DEL so a huge match set doesn't exceed argument limits.
+        deleted = 0
+        chunk = 200
+        for start in range(0, len(keys), chunk):
+            deleted += int(client.delete(*keys[start : start + chunk]))
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] deleted {deleted} key(s); the next "
+            "CI run records fresh cassettes under the new fixtures.\n"
+        )
+    except Exception as exc:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] DEL failed: " f"{type(exc).__name__}: {exc}\n"
+        )
+        return
+    # Mark the one-shot flush as completed so subsequent CI runs short-circuit
+    # above and leave the freshly recorded cassettes intact.
+    try:
+        client.set(sentinel_key, b"1")
+    except Exception as exc:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] failed to set sentinel "
+            f"{sentinel_key!r}: {type(exc).__name__}: {exc}\n"
+        )
 
 
 def pytest_runtest_logreport(report):

@@ -58,6 +76,108 @@ def _vcr_outcome_gate(request, vcr):
 
 def pytest_configure(config):
     _verbose_state.remember_pluginmanager(config)
+    _ONE_SHOT_flush_overflowed_image_edit_cassettes()
+
+
+# !!! TEMPORARY ONE-SHOT HACK -- REVERT IMMEDIATELY AFTER A SINGLE CI RUN !!!
+#
+# The pre-fix image-edit cassettes accumulated >50 episodes and the
+# persister refuses to save updates. Without clearing them, the new
+# fixed-boundary + raw-bytes fixtures will still load the stale
+# 51-entry cassette, miss against every existing episode, and hit
+# ``MAX_EPISODES_PER_CASSETTE`` again on save -- so the live-call
+# bleed continues. ``scripts/flush_image_edit_vcr_cassettes.py`` is
+# the proper tool for this, but it needs interactive access to the
+# production ``CASSETTE_REDIS_URL``. This hook runs the same SCAN/DEL
+# inside the CircleCI ``image_gen_testing`` job (which already has
+# ``CASSETTE_REDIS_URL`` injected) so the very next run records a
+# clean cassette without anyone needing the prod Redis URL.
+#
+# THIS BLOCK MUST BE FORCE-REVERTED AFTER ONE SUCCESSFUL RUN. Leaving
+# it in would silently nuke the cassettes on every subsequent run,
+# permanently re-billing the live provider -- the exact bug the rest
+# of this PR is trying to fix.
+def _ONE_SHOT_flush_overflowed_image_edit_cassettes():
+    redis_url = os.environ.get("CASSETTE_REDIS_URL")
+    if not redis_url:
+        return
+    try:
+        import redis as _redis
+    except ImportError:
+        sys.stderr.write(
+            "[one-shot-cassette-flush] redis package not installed; skipping.\n"
+        )
+        return
+    pattern = "litellm:vcr:cassette:tests/image_gen_tests/test_image_edits/*"
+    sentinel_key = (
+        "litellm:vcr:one_shot_flush:tests/image_gen_tests/test_image_edits:done"
+    )
+    try:
+        client = _redis.Redis.from_url(
+            redis_url,
+            socket_timeout=10,
+            socket_connect_timeout=10,
+            decode_responses=False,
+        )
+        # Self-disable: if a previous CI run already performed the flush,
+        # the sentinel key exists and we must NOT delete cassettes again
+        # (doing so would force the live-API record path on every run).
+        if client.exists(sentinel_key):
+            sys.stderr.write(
+                "[one-shot-cassette-flush] sentinel key "
+                f"{sentinel_key!r} already set; skipping flush.\n"
+            )
+            return
+        keys = sorted(client.scan_iter(match=pattern, count=500))
+    except Exception as exc:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] could not enumerate keys "
+            f"under {pattern}: {type(exc).__name__}: {exc}\n"
+        )
+        return
+    if not keys:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] no keys matched {pattern}; nothing to do.\n"
+        )
+        # Still mark as done so a later run that records cassettes is not
+        # wiped out by this hook on the run after that.
+        try:
+            client.set(sentinel_key, b"1")
+        except Exception as exc:
+            sys.stderr.write(
+                f"[one-shot-cassette-flush] failed to set sentinel "
+                f"{sentinel_key!r}: {type(exc).__name__}: {exc}\n"
+            )
+        return
+    sys.stderr.write(
+        f"[one-shot-cassette-flush] deleting {len(keys)} cassette key(s):\n"
+    )
+    for k in keys:
+        sys.stderr.write(f"[one-shot-cassette-flush]   {k!r}\n")
+    try:
+        # Batch the DEL so a huge match set doesn't exceed argument limits.
+        deleted = 0
+        chunk = 200
+        for start in range(0, len(keys), chunk):
+            deleted += int(client.delete(*keys[start : start + chunk]))
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] deleted {deleted} key(s); the next "
+            "CI run records fresh cassettes under the new fixtures.\n"
+        )
+    except Exception as exc:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] DEL failed: " f"{type(exc).__name__}: {exc}\n"
+        )
+        return
+    # Mark the one-shot flush as completed so subsequent CI runs short-circuit
+    # above and leave the freshly recorded cassettes intact.
+    try:
+        client.set(sentinel_key, b"1")
+    except Exception as exc:
+        sys.stderr.write(
+            f"[one-shot-cassette-flush] failed to set sentinel "
+            f"{sentinel_key!r}: {type(exc).__name__}: {exc}\n"
+        )
 
 
 def pytest_runtest_logreport(report):

diff --git a/tests/image_gen_tests/test_image_edits.py b/tests/image_gen_tests/test_image_edits.py
--- a/tests/image_gen_tests/test_image_edits.py
+++ b/tests/image_gen_tests/test_image_edits.py
@@ -103,12 +103,16 @@ async def test_openai_image_edit_litellm_sdk(self, sync_mode):
 pwd = os.path.dirname(os.path.realpath(__file__))
 
 
-# Image fixtures must be regenerated per access — module-level
-# ``open(...)`` handles get consumed after a single multipart upload, leaving
-# subsequent tests in the same process to send empty bodies. That non-determinism
-# (a) blows the recorded cassette past ``MAX_EPISODES_PER_CASSETTE`` so the
-# persister refuses to save (see ``tests/_vcr_redis_persister.py``), and
-# (b) re-bills the live image edit endpoint on every CI run.
+# Image fixtures are returned as raw ``bytes`` (not file handles or
+# ``BytesIO`` streams) so that every SDK / Router retry sees the same
+# payload. A ``BytesIO`` whose file pointer is left at EOF by the first
+# multipart upload silently encodes an empty image on the second
+# attempt, producing a different request body — VCR records that
+# divergent body as a fresh episode, the cassette eventually crosses
+# ``MAX_EPISODES_PER_CASSETTE`` in ``tests/_vcr_redis_persister.py``,
+# the persister refuses to save, and every subsequent CI run re-bills
+# the live image-edit endpoint. Raw bytes are immutable, position-less,
+# and re-encoded identically on every retry attempt.
 def _read_image_bytes(filename: str) -> bytes:
     with open(os.path.join(pwd, filename), "rb") as f:
         return f.read()

@@ -103,12 +103,16 @@ async def test_openai_image_edit_litellm_sdk(self, sync_mode):
 pwd = os.path.dirname(os.path.realpath(__file__))
 
 
-# Image fixtures must be regenerated per access — module-level
-# ``open(...)`` handles get consumed after a single multipart upload, leaving
-# subsequent tests in the same process to send empty bodies. That non-determinism
-# (a) blows the recorded cassette past ``MAX_EPISODES_PER_CASSETTE`` so the
-# persister refuses to save (see ``tests/_vcr_redis_persister.py``), and
-# (b) re-bills the live image edit endpoint on every CI run.
+# Image fixtures are returned as raw ``bytes`` (not file handles or
+# ``BytesIO`` streams) so that every SDK / Router retry sees the same
+# payload. A ``BytesIO`` whose file pointer is left at EOF by the first
+# multipart upload silently encodes an empty image on the second
+# attempt, producing a different request body — VCR records that
+# divergent body as a fresh episode, the cassette eventually crosses
+# ``MAX_EPISODES_PER_CASSETTE`` in ``tests/_vcr_redis_persister.py``,
+# the persister refuses to save, and every subsequent CI run re-bills
+# the live image-edit endpoint. Raw bytes are immutable, position-less,
+# and re-encoded identically on every retry attempt.
 def _read_image_bytes(filename: str) -> bytes:
     with open(os.path.join(pwd, filename), "rb") as f:
         return f.read()
@@ -119,30 +123,34 @@ def _read_image_bytes(filename: str) -> bytes:
 
 
 def _make_test_images() -> list:
-    """Return a fresh pair of image streams seeded with the fixture bytes.
-
-    Use this everywhere you'd previously have used the module-level
-    ``TEST_IMAGES``. Each call returns brand new ``BytesIO`` objects whose
-    file pointers start at 0, so multipart uploads encode the full image
-    bytes on every test invocation. Parametrized and ``flaky``-retried
-    test methods call ``get_base_image_edit_call_args`` once per
-    invocation, so a fresh stream per call is sufficient — the factory
-    must not auto-rewind on EOF or the SDK's multipart writer will read
-    the same bytes forever (worker OOM).
+    """Return the pair of fixture images as raw ``bytes`` payloads.
+
+    ``httpx`` accepts a ``bytes`` value anywhere a file-like upload is
+    expected and re-encodes it identically on every multipart attempt
+    — so SDK-level retries can never produce a divergent empty-body
+    episode (the root cause of the cassette-overflow leak that bills
+    ``gpt-image-1`` on every CI run).
     """
-    return [
-        BytesIO(_ISHAAN_GITHUB_BYTES),
-        BytesIO(_LITELLM_SITE_BYTES),
-    ]
+    return [_ISHAAN_GITHUB_BYTES, _LITELLM_SITE_BYTES]
 
 
-def _make_single_test_image() -> BytesIO:
-    return BytesIO(_ISHAAN_GITHUB_BYTES)
+def _make_single_test_image() -> bytes:
+    return _ISHAAN_GITHUB_BYTES
 
 
 def get_test_images_as_bytesio():
-    """Helper function to get test images as BytesIO objects"""
-    return _make_test_images()
+    """Return the fixture images as fresh ``BytesIO`` streams.
+
+    Kept distinct from ``_make_test_images`` so the BytesIO-specific
+    smoke tests (``test_openai_image_edit_with_bytesio``,
+    ``test_multiple_image_edit_with_different_formats``) still exercise
+    the file-like upload path. Each call yields brand new streams so
+    the file pointer always starts at 0 for that test invocation.
+    """
+    return [
+        BytesIO(_ISHAAN_GITHUB_BYTES),
+        BytesIO(_LITELLM_SITE_BYTES),
+    ]
 
 
 class TestOpenAIImageEditGPTImage1(BaseLLMImageEditTest):

@@ -119,30 +123,34 @@ def _read_image_bytes(filename: str) -> bytes:
 
 
 def _make_test_images() -> list:
-    """Return a fresh pair of image streams seeded with the fixture bytes.
-
-    Use this everywhere you'd previously have used the module-level
-    ``TEST_IMAGES``. Each call returns brand new ``BytesIO`` objects whose
-    file pointers start at 0, so multipart uploads encode the full image
-    bytes on every test invocation. Parametrized and ``flaky``-retried
-    test methods call ``get_base_image_edit_call_args`` once per
-    invocation, so a fresh stream per call is sufficient — the factory
-    must not auto-rewind on EOF or the SDK's multipart writer will read
-    the same bytes forever (worker OOM).
+    """Return the pair of fixture images as raw ``bytes`` payloads.
+
+    ``httpx`` accepts a ``bytes`` value anywhere a file-like upload is
+    expected and re-encodes it identically on every multipart attempt
+    — so SDK-level retries can never produce a divergent empty-body
+    episode (the root cause of the cassette-overflow leak that bills
... diff truncated: showing 800 of 857 lines

_{You can send follow-ups to the cloud agent here.}

CLAassistant · 2026-05-17T06:19:03Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ yuneng-berri
✅ mateo-berri
❌ shin-berri
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Temporary observability boost so we can root-cause why ``test_image_edits.py`` async parametrizes still record fresh episodes on every CI run even though the multipart boundary is now pinned (sync parametrizes cache cleanly as VCR HIT). The matcher currently raises ``AssertionError("request bodies differ")`` with zero context, so we cannot tell whether the live body genuinely varies, the matcher is comparing a bytes object to a stream object, or the normalizer is silently skipping the body because it is not bytes/str. Three logs added; the first two are worth keeping permanently, the third is intended to be reverted after the diagnosis lands: 1. ``_safe_body_matcher`` now emits a structured stderr block on mismatch (type of each side, length, SHA-256, first divergent byte offset, ±100-byte window). Always-on -- mismatches are signal, not noise, and the existing per-test verdict already logs once per test. PERMANENT. 2. ``_normalize_multipart_boundary`` now logs to stderr when the body type is not bytes/bytearray/str -- the silent ``else: return`` branch was masking exactly the case we suspect is firing on async (httpx ``MultipartStream`` handed to vcrpy before the body is read). PERMANENT. 3. ``_RedisPersister.save_cassette`` now logs every episode's body SHA-256, length, and 120-byte preview at save time. This lets two consecutive CI runs be diffed: if the same test records a different hash run-to-run, the live body genuinely varies; if both runs record the same hash but the matcher still misses, the bug is in the matcher itself. TEMPORARY -- revert once the async variance is identified and fixed. Once a single ``image_gen_testing`` CI run produces these logs, revert this commit (or just the persister hash block) with a force push so the cassette save path is not noisy in steady-state.

cursor

Cursor Bugbot has reviewed your changes using high mode and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Temporary diagnostic code explicitly intended to be reverted
- Removed the TEMP _maybe_log_episode_body_hashes helper and its unconditional call from save_cassette so VCR cassette saves no longer emit per-episode body-hash warnings.

Preview (113143069d)

diff --git a/scripts/flush_image_edit_vcr_cassettes.py b/scripts/flush_image_edit_vcr_cassettes.py
new file mode 100644
--- /dev/null
+++ b/scripts/flush_image_edit_vcr_cassettes.py
@@ -1,0 +1,131 @@
+#!/usr/bin/env python3
+"""Flush the bloated image-edit VCR cassettes from the cassette Redis.
+
+Run this **once** after merging the multipart-boundary stabilization
+PR. The pre-fix cassettes for the async image-edit tests have
+accumulated >50 episodes (random multipart boundary on every run +
+``record_mode="new_episodes"`` = monotonic growth), so the persister
+refuses to save updates -- meaning every CI run after the fix would
+still try to re-record against the stale 51-entry cassette, hit
+``MAX_EPISODES_PER_CASSETTE`` again, get refused, and re-bill the live
+provider.
+
+Deleting these keys forces the next CI run to record a clean cassette
+under the new fixed-boundary + raw-bytes fixtures (1 episode per
+unique call), after which the 24-hour TTL replay loop kicks in
+normally.
+
+Scope is intentionally narrow:
+  * Only ``tests/image_gen_tests/test_image_edits/*`` cassette keys
+    are touched. Image-*generation* cassettes (TestOpenAIGPTImage1
+    etc.) are unaffected -- they were already in the VCR HIT state.
+  * Lists every match in dry-run mode before deleting anything so the
+    operator can confirm the impact.
+
+Usage:
+    CASSETTE_REDIS_URL=redis://... \
+        uv run python scripts/flush_image_edit_vcr_cassettes.py --dry-run
+
+    CASSETTE_REDIS_URL=redis://... \
+        uv run python scripts/flush_image_edit_vcr_cassettes.py --yes
+
+``CASSETTE_REDIS_URL`` is the same env var the persister reads at CI
+start (see ``tests/_vcr_redis_persister.py``).
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import sys
+
+import redis
+
+
+CASSETTE_REDIS_URL_ENV = "CASSETTE_REDIS_URL"
+REDIS_KEY_PREFIX = "litellm:vcr:cassette:"
+TARGET_KEY_PATTERN = f"{REDIS_KEY_PREFIX}tests/image_gen_tests/test_image_edits/*"
+
+
+def _build_client(url: str) -> redis.Redis:
+    return redis.Redis.from_url(
+        url,
+        socket_timeout=10,
+        socket_connect_timeout=10,
+        decode_responses=False,
+    )
+
+
+def _scan_matching_keys(client: redis.Redis, pattern: str) -> list[bytes]:
+    return sorted(client.scan_iter(match=pattern, count=500))
+
+
+def _delete_keys(client: redis.Redis, keys: list[bytes]) -> int:
+    if not keys:
+        return 0
+    # Batch into chunks so a single DEL call does not exceed the
+    # server's argument-count or buffer limits on large key sets.
+    deleted = 0
+    chunk_size = 200
+    for start in range(0, len(keys), chunk_size):
+        batch = keys[start : start + chunk_size]
+        deleted += int(client.delete(*batch))
+    return deleted
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--yes",
+        action="store_true",
+        help="Actually delete the matched keys. Without this flag the script "
+        "runs in dry-run mode and only lists what would be deleted.",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="List matched keys without deleting (default behaviour when "
+        "--yes is omitted; kept as an explicit flag for clarity).",
+    )
+    parser.add_argument(
+        "--pattern",
+        default=TARGET_KEY_PATTERN,
+        help=f"Override the SCAN match pattern. Default: {TARGET_KEY_PATTERN}",
+    )
+    args = parser.parse_args(argv)
+
+    url = os.environ.get(CASSETTE_REDIS_URL_ENV)
+    if not url:
+        print(
+            f"error: {CASSETTE_REDIS_URL_ENV} is not set. Set it to the "
+            "cassette Redis URL (same URL the persister reads in CI).",
+            file=sys.stderr,
+        )
+        return 2
+
+    client = _build_client(url)
+    try:
+        client.ping()
+    except redis.RedisError as exc:
+        print(f"error: cannot reach cassette Redis: {exc}", file=sys.stderr)
+        return 2
+
+    matches = _scan_matching_keys(client, args.pattern)
+    print(f"matched {len(matches)} key(s) under pattern: {args.pattern}")
+    for key in matches:
+        print(f"  {key.decode('utf-8', errors='replace')}")
+
+    if not matches:
+        return 0
+
+    if not args.yes:
+        print("\ndry run -- pass --yes to actually delete these keys.")
+        return 0
+
+    deleted = _delete_keys(client, matches)
+    print(f"\ndeleted {deleted} key(s).")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())

diff --git a/tests/_vcr_conftest_common.py b/tests/_vcr_conftest_common.py
--- a/tests/_vcr_conftest_common.py
+++ b/tests/_vcr_conftest_common.py
@@ -91,6 +91,55 @@
 VCR_FIXED_MULTIPART_BOUNDARY = "vcr-static-boundary"
 
 
+def pin_httpx_multipart_boundary(monkeypatch) -> None:
+    """Force every httpx multipart request to use a constant boundary.
+
+    httpx's ``MultipartStream`` generates a fresh ``boundary=<random hex>``
+    via ``os.urandom(16)`` whenever the caller does not supply one
+    (see ``httpx._multipart.MultipartStream.__init__``). That random
+    boundary appears both in the ``Content-Type`` header and verbatim in
+    the request body between each part.
+
+    ``_normalize_multipart_boundary`` rewrites the header reliably, but
+    on the async transport path the body is not always handed to
+    ``before_record_request`` as a contiguous ``bytes`` object — so the
+    body replacement silently no-ops and the recorded cassette retains
+    the random boundary string. Subsequent runs generate a *different*
+    random boundary, the ``safe_body`` matcher misses, and
+    ``record_mode="new_episodes"`` appends a fresh episode until the
+    cassette crosses ``MAX_EPISODES_PER_CASSETTE`` and the persister
+    refuses to save — re-billing live providers on every CI run.
+
+    Pinning the boundary at the source removes the variance entirely:
+    every run emits byte-identical multipart bodies, the existing
+    ``safe_body`` matcher succeeds on the first request, and one
+    recorded episode per unique call satisfies replays for the cassette
+    TTL.
+
+    This wraps ``MultipartStream.__init__`` instead of patching the
+    boundary-generation helper directly because httpx inlines
+    ``os.urandom(16).hex().encode("ascii")`` in the constructor body
+    rather than calling a named function. We preserve the caller's
+    boundary when one is explicitly supplied so production-style code
+    that pins its own boundary keeps working.
+    """
+    try:
+        import httpx._multipart as _httpx_multipart
+    except ImportError:  # pragma: no cover - httpx is a hard test dep
+        return
+
+    _original_init = _httpx_multipart.MultipartStream.__init__
+
+    def _init_with_fixed_boundary(self, data, files, boundary=None):
+        if boundary is None:
+            boundary = VCR_FIXED_MULTIPART_BOUNDARY.encode("ascii")
+        return _original_init(self, data=data, files=files, boundary=boundary)
+
+    monkeypatch.setattr(
+        _httpx_multipart.MultipartStream, "__init__", _init_with_fixed_boundary
+    )
+
+
 def _scrub_response(response):
     if not isinstance(response, dict):
         return response
@@ -194,6 +243,13 @@
     (e.g. the Bedrock batch S3 PUT) before it can return "no match".
     This matcher is strictly more conservative — the only equivalence
     it gives up vs. the default is "JSON key order doesn't matter".
+
+    On mismatch, emits a structured diagnostic to stderr (type of each
+    body, length, SHA-256, first divergent offset, ±100-byte window).
+    Without this, vcrpy returns "request bodies differ" with zero
+    context, and bugs where the live request body is an unbytes-like
+    object (e.g. an httpx ``MultipartStream`` for async requests) look
+    indistinguishable from genuine content drift.
     """
     body1 = getattr(r1, "body", None)
     body2 = getattr(r2, "body", None)
@@ -213,9 +269,64 @@
     n2 = _to_bytes(body2)
     if n1 is not None and n2 is not None and n1 == n2:
         return
+    _emit_body_mismatch_diagnostic(r1, r2, body1, body2, n1, n2)
     raise AssertionError("request bodies differ")
 
 
+def _emit_body_mismatch_diagnostic(r1, r2, body1, body2, n1, n2) -> None:
+    """Dump enough info to a single stderr block to root-cause why two
+    requests that look semantically identical failed the body matcher.
+
+    Always-on (matcher mismatches are signal, not noise): the volume
+    is bounded by the number of stored episodes a live request is
+    compared against, and we already log a per-test verdict line for
+    every test.
+    """
+
+    def _describe(label, raw, asbytes):
+        t = type(raw).__name__
+        if asbytes is None:
+            return (
+                f"  {label}: type={t!r} length=unknown sha256=N/A "
+                f"(body could not be coerced to bytes)"
+            )
+        length = len(asbytes)
+        digest = hashlib.sha256(asbytes).hexdigest()
+        preview = asbytes[:120]
+        return (
+            f"  {label}: type={t!r} length={length} sha256={digest} "
+            f"preview={preview!r}"
+        )
+
+    method_a = getattr(r1, "method", "?")
+    method_b = getattr(r2, "method", "?")
+    url_a = getattr(r1, "uri", getattr(r1, "url", "?"))
+    url_b = getattr(r2, "uri", getattr(r2, "url", "?"))
+    lines = [
+        "[vcr-safe-body-matcher] request body mismatch",
+        f"  request[a]: {method_a} {url_a}",
+        f"  request[b]: {method_b} {url_b}",
+        _describe("body[a]", body1, n1),
+        _describe("body[b]", body2, n2),
+    ]
+    if n1 is not None and n2 is not None and n1 != n2:
+        # Find the first divergent byte offset and dump a ±100 window
+        # around it so the human reading the CI log can see at a glance
+        # whether the variance is a UUID, a timestamp, a random multipart
+        # boundary, or something else.
+        offset = next(
+            (i for i in range(min(len(n1), len(n2))) if n1[i] != n2[i]),
+            min(len(n1), len(n2)),
+        )
+        start = max(0, offset - 100)
+        end_a = min(len(n1), offset + 100)
+        end_b = min(len(n2), offset + 100)
+        lines.append(f"  first divergent byte offset: {offset}")
+        lines.append(f"  window[a] @ {start}..{end_a}: {n1[start:end_a]!r}")
+        lines.append(f"  window[b] @ {start}..{end_b}: {n2[start:end_b]!r}")
+    sys.stderr.write("\n".join(lines) + "\n")
+
+
 def _iter_header_values(headers, name: str):
     if headers is None:
         return
@@ -360,6 +471,20 @@
     elif isinstance(body, str):
         new_body = body.replace(current_boundary, VCR_FIXED_MULTIPART_BOUNDARY)
     else:
+        # The body is something other than bytes/bytearray/str -- most
+        # likely an httpx ``MultipartStream`` or an aiter chunked stream
+        # we cannot rewrite in place. Log it so a body-matcher miss on a
+        # multipart request can be correlated with "normalizer skipped
+        # because body type was X". The header was still rewritten
+        # above, so the recorded Content-Type stays stable; only the
+        # body bytes carry the random boundary verbatim.
+        sys.stderr.write(
+            f"[vcr-multipart-normalize] body normalization SKIPPED: "
+            f"body type {type(body).__name__!r} is not bytes/bytearray/str. "
+            f"content-type={content_type_value!r}. "
+            f"Recorded body will retain the random boundary substring "
+            f"and the safe_body matcher will miss on the next run.\n"
+        )
         return
 
     try:

diff --git a/tests/image_gen_tests/conftest.py b/tests/image_gen_tests/conftest.py
--- a/tests/image_gen_tests/conftest.py
+++ b/tests/image_gen_tests/conftest.py
@@ -15,6 +15,7 @@
     emit_cassette_cache_session_banner,
     emit_vcr_classification_summary,
     install_live_call_probe,
+    pin_httpx_multipart_boundary,
     record_vcr_outcome,
     register_persister_if_enabled,
     vcr_config_dict,
@@ -33,6 +34,23 @@
     loop.close()
 
 
+@pytest.fixture(scope="session", autouse=True)
+def _pin_multipart_boundary():
+    """Pin httpx's random multipart boundary to a constant for the
+    entire image-gen test session. Without this, async multipart bodies
+    contain a fresh ``boundary=<random hex>`` on every run; the
+    ``safe_body`` matcher misses, and ``record_mode="new_episodes"``
+    grows each cassette by one entry per run until it crosses the
+    50-episode persister cap and stops being saved — leaving the test
+    to hit the real provider on every CI run. See
+    ``pin_httpx_multipart_boundary`` for the full rationale.
+    """
+    monkeypatch = pytest.MonkeyPatch()
+    pin_httpx_multipart_boundary(monkeypatch)
+    yield
+    monkeypatch.undo()
+
+
 @pytest.fixture(scope="module")
 def vcr_config():
     return vcr_config_dict()

diff --git a/tests/image_gen_tests/test_image_edits.py b/tests/image_gen_tests/test_image_edits.py
--- a/tests/image_gen_tests/test_image_edits.py
+++ b/tests/image_gen_tests/test_image_edits.py
@@ -103,12 +103,16 @@
 pwd = os.path.dirname(os.path.realpath(__file__))
 
 
-# Image fixtures must be regenerated per access — module-level
-# ``open(...)`` handles get consumed after a single multipart upload, leaving
-# subsequent tests in the same process to send empty bodies. That non-determinism
-# (a) blows the recorded cassette past ``MAX_EPISODES_PER_CASSETTE`` so the
-# persister refuses to save (see ``tests/_vcr_redis_persister.py``), and
-# (b) re-bills the live image edit endpoint on every CI run.
+# Image fixtures are returned as raw ``bytes`` (not file handles or
+# ``BytesIO`` streams) so that every SDK / Router retry sees the same
+# payload. A ``BytesIO`` whose file pointer is left at EOF by the first
+# multipart upload silently encodes an empty image on the second
+# attempt, producing a different request body — VCR records that
+# divergent body as a fresh episode, the cassette eventually crosses
+# ``MAX_EPISODES_PER_CASSETTE`` in ``tests/_vcr_redis_persister.py``,
+# the persister refuses to save, and every subsequent CI run re-bills
+# the live image-edit endpoint. Raw bytes are immutable, position-less,
+# and re-encoded identically on every retry attempt.
 def _read_image_bytes(filename: str) -> bytes:
     with open(os.path.join(pwd, filename), "rb") as f:
         return f.read()
@@ -119,32 +123,36 @@
 
 
 def _make_test_images() -> list:
-    """Return a fresh pair of image streams seeded with the fixture bytes.
+    """Return the pair of fixture images as raw ``bytes`` payloads.
 
-    Use this everywhere you'd previously have used the module-level
-    ``TEST_IMAGES``. Each call returns brand new ``BytesIO`` objects whose
-    file pointers start at 0, so multipart uploads encode the full image
-    bytes on every test invocation. Parametrized and ``flaky``-retried
-    test methods call ``get_base_image_edit_call_args`` once per
-    invocation, so a fresh stream per call is sufficient — the factory
-    must not auto-rewind on EOF or the SDK's multipart writer will read
-    the same bytes forever (worker OOM).
+    ``httpx`` accepts a ``bytes`` value anywhere a file-like upload is
+    expected and re-encodes it identically on every multipart attempt
+    — so SDK-level retries can never produce a divergent empty-body
+    episode (the root cause of the cassette-overflow leak that bills
+    ``gpt-image-1`` on every CI run).
     """
-    return [
-        BytesIO(_ISHAAN_GITHUB_BYTES),
-        BytesIO(_LITELLM_SITE_BYTES),
-    ]
+    return [_ISHAAN_GITHUB_BYTES, _LITELLM_SITE_BYTES]
 
 
-def _make_single_test_image() -> BytesIO:
-    return BytesIO(_ISHAAN_GITHUB_BYTES)
+def _make_single_test_image() -> bytes:
+    return _ISHAAN_GITHUB_BYTES
 
 
 def get_test_images_as_bytesio():
-    """Helper function to get test images as BytesIO objects"""
-    return _make_test_images()
+    """Return the fixture images as fresh ``BytesIO`` streams.
 
+    Kept distinct from ``_make_test_images`` so the BytesIO-specific
+    smoke tests (``test_openai_image_edit_with_bytesio``,
+    ``test_multiple_image_edit_with_different_formats``) still exercise
+    the file-like upload path. Each call yields brand new streams so
+    the file pointer always starts at 0 for that test invocation.
+    """
+    return [
+        BytesIO(_ISHAAN_GITHUB_BYTES),
+        BytesIO(_LITELLM_SITE_BYTES),
+    ]
 
+
 class TestOpenAIImageEditGPTImage1(BaseLLMImageEditTest):
     """
     Concrete implementation of BaseLLMImageEditTest for OpenAI image edits.
@@ -710,9 +718,9 @@
     try:
         prompt = "Create a cohesive artistic style across all images"
 
-        # Test with mixed BytesIO and file objects
+        # Test with mixed raw-bytes and BytesIO inputs
         mixed_images = [
-            _make_single_test_image(),  # File object
+            _make_single_test_image(),  # raw ``bytes`` payload
             get_test_images_as_bytesio()[1],  # BytesIO object
         ]

_{You can send follow-ups to the cloud agent here.}

… capture) Re-push of the diagnostic logging from the previous commit, this time wired so the output actually survives to the CI log. xdist captures stdout/stderr from every passing test in the worker process; the body-matcher and normalizer-skip diagnostics fire from inside vcrpy machinery during the test, so for any test that ultimately passes (which is all of them once the cassettes are recorded), the diagnostic lines are silently swallowed. Fix: write each diagnostic line to a per-PID file under ``test-results/vcr-diagnostics/<pid>.log`` instead of writing to stderr. The controller's ``pytest_terminal_summary`` aggregates those files and writes them through ``terminalreporter.write_line``, which is not subject to per-test capture. As a bonus, ``test-results/`` is already collected by the ``store_test_results`` step in CircleCI, so the raw per-worker logs survive as build artifacts even after the test session ends. Three call sites updated: 1. ``_emit_body_mismatch_diagnostic`` (matcher) -- writes the structured type/length/sha/window block via ``vcr_diag_write_line``. 2. ``_normalize_multipart_boundary`` -- logs the silent-skip path (body not bytes/bytearray/str) the same way. 3. ``_maybe_log_episode_body_hashes`` (persister) -- replaces the ``_log.warning`` calls (which the root-logger config also swallows in CI) with ``vcr_diag_write_line``. Image-gen conftest is the only suite wired to dump the aggregated log at session end. Other suites can opt in by adding ``emit_vcr_diagnostic_log(terminalreporter)`` to their own ``pytest_terminal_summary``. The diagnostic dir is cleared at the start of each session (controller-only) so a local rerun does not mix output from prior runs. Same revert plan as the previous diagnostic commit: keep the matcher + normalizer skip diagnostics permanently (they only fire on signal events), revert the persister body-hash dump once the async variance is identified.

Root cause of the residual async image-edit cassette leak. The diagnostic run for ``ba3915d9`` printed: [vcr-safe-body-matcher] request body mismatch body[a]: type='list_iterator' length=unknown sha256=N/A body[b]: type='list_iterator' length=unknown sha256=N/A httpx's async transport hands vcrpy a ``request.body`` that is a ``list_iterator`` over multipart chunks rather than a contiguous ``bytes`` blob. Two consequences: 1. ``_safe_body_matcher`` compares the two iterator objects with ``==``, which is identity comparison for arbitrary iterators - semantically identical multipart bodies never compare equal, and ``record_mode="new_episodes"`` appends a new episode on every CI run until the cassette crosses ``MAX_EPISODES_PER_CASSETTE`` and the persister refuses to save (this is exactly what the OVERFLOW warning has been catching). 2. ``_normalize_multipart_boundary`` short-circuits its ``else: return`` branch because the body is neither bytes nor str, so any residual random boundary characters in the body bytes are never rewritten. Sync requests do not hit this code path: httpx's sync transport hands vcrpy a single ``bytes`` body, so ``==`` works and the boundary normalizer runs as intended. That is why ``test_openai_image_edit_litellm_sdk[True]`` records to ``entries=1`` and replays cleanly while ``[False]`` (async) kept growing by one episode per run. Fix: add ``_materialize_iterable_body`` which coalesces an iterable ``request.body`` into ``bytes`` in-place. Call it from two places: * The top of ``_before_record_request``, so the boundary normalizer and the cassette serializer both see bytes from then on. * The top of ``_safe_body_matcher``, as defense in depth in case a future vcrpy code path invokes the matcher without first going through ``_before_record_request``. The vcrpy ``Request`` is a wrapper used for matching and recording; the underlying httpx transport sends its own request body separately, so replacing the iterator on the vcrpy wrapper does not starve the live HTTP send. After this lands the async parametrizes should flip from ``[VCR MISS:RECORDED] entries=N+1`` to ``[VCR HIT] entries=N`` on the next CI run, matching the sync side and dropping the residual ~$3/day to $0.

Follow-up to 8e08272. The previous attempt at coalescing iterable request bodies bailed out (``return`` without writing ``request.body``) whenever it could not classify the chunk type. That was the wrong failure mode for one critical case: vcrpy sometimes presents the body as ``iter(some_bytes)``, whose Python type is ``bytes_iterator`` and which yields ``int`` byte values (0-255), not byte chunks. The old code saw an ``int`` chunk, hit the ``else: return`` branch, and left ``request.body`` pointing at the now-exhausted iterator. The post-fix diagnostic run made this loud: [vcr-safe-body-matcher] request body mismatch body[a]: type='bytes_iterator' length=unknown sha256=N/A body[b]: type='bytes_iterator' length=unknown sha256=N/A Every async image-edit test then ballooned from entries=2 to entries=10 in that single CI run -- the exhausted iterator meant the live multipart upload went out as an empty body, OpenAI returned 400, the SDK + flaky retries fired, each retry got a fresh iterator that my hook exhausted again, and ``new_episodes`` recorded each failed attempt as a new cassette episode. This patch: * Recognizes ``bytes_iterator`` (chunks are ``int``) and reconstructs the buffer via ``bytes(chunks)``. * Keeps the existing ``list_iterator``-over-bytes-chunks handling via ``b"".join(...)``. * **Always writes a bytes value back to ``request.body`` after consuming the iterator.** If the chunk shape is unrecognized, ``request.body`` is set to ``b""`` rather than left as an exhausted iterator. That is wrong in the sense of "we lost the body" but right in the sense of "the failure mode is now visible (live API call sends empty body and fails fast) instead of invisible (corrupt cassette grows silently)". Combined with the matcher diagnostic, any future regression in this code path will surface in the CI log immediately. Local verification covers ``bytes_iterator``, ``list_iterator`` over bytes chunks, generator over bytes chunks, empty iterator, already-bytes (idempotent), identical-content iterator equality in the matcher (now matches), and differing-content iterator inequality (still raises).

…s stay bytes Actual root cause of the async image-edit cassette leak. The previous diagnostic run produced this dead giveaway: [vcr-episode-body-hash] ... episode[0]: body type='bytes_iterator' is not bytes/bytearray/str -- cannot hash [vcr-safe-body-matcher] request body mismatch body[a]: type='bytes_iterator' length=unknown sha256=N/A body[b]: type='bytes_iterator' length=unknown sha256=N/A Both sides of the matcher were ``bytes_iterator`` **after** the materializer had supposedly converted them to bytes. That made no sense until I read vcrpy's ``Request`` class. vcrpy's ``Request`` keeps two private flags that are set in ``__init__`` from the original body's type and **never cleared by the setter**: def __init__(self, method, uri, body, headers): self._was_file = hasattr(body, "read") self._was_iter = _is_nonsequence_iterator(body) ... @Property def body(self): if self._was_file: return BytesIO(self._body) if self._was_iter: return iter(self._body) return self._body @body.setter def body(self, value): if isinstance(value, str): value = value.encode("utf-8") self._body = value # <-- does NOT touch _was_iter / _was_file So when httpx's async transport hands vcrpy an iterator body, ``_was_iter`` becomes ``True`` and stays there forever. Even after ``_materialize_iterable_body`` writes plain bytes via ``request.body = out``, the next read of ``.body`` re-wraps the stored bytes in ``iter()`` -- producing a fresh ``bytes_iterator`` that compares unequal to any other ``bytes_iterator`` via object identity. The matcher missed every time, the cassette grew by one episode per run, and the persister saw the same iterator type when trying to hash the body for the diagnostic log. Fix: after writing the materialized bytes, also force ``_was_iter`` and ``_was_file`` to ``False``. vcrpy exposes no public API for this, so we touch the private flags directly -- acknowledged as a pragmatic test-only hack with a clear unit boundary (the only call site is ``_materialize_iterable_body``). Local repro reproduces the exact production setup: ``Request('POST', url, iter(b'multipart-content'), {})`` on two sides, runs the matcher, asserts HIT. Verified the matcher hits on identical content and still raises on differing content. Should be the last fix needed. Existing cassettes that contain oddly-shaped bodies (lists of int chunks, etc. from the previous ``_was_iter=True`` save path) still match because the materializer canonicalises both sides to bytes before comparison -- no fourth re-flush required.

Removed now that 1c51ad1 has confirmed the root cause (vcrpy's sticky ``_was_iter`` flag making the body getter re-wrap stored bytes in ``iter()`` on every access). The hash dump did its job -- the post-1c51ad13 image_gen_testing run shows all five async image-edit tests as ``[VCR HIT]`` with stable entry counts and zero billing errors -- and is too noisy to keep on by default (over 100 lines per session at steady state). Kept permanently: * ``_safe_body_matcher`` mismatch diagnostic in ``_vcr_conftest_common.py``. Only fires on a body mismatch, which is signal worth surfacing whenever it happens. * ``_normalize_multipart_boundary`` "skipped" log line. Same rationale -- only fires when the body shape is something the normalizer cannot rewrite in place. * The ``test-results/vcr-diagnostics/<pid>.log`` per-PID file plumbing (``vcr_diag_write_line`` / ``emit_vcr_diagnostic_log``). Useful for any future diagnostic that needs to bypass xdist stdout/stderr capture; cheap to keep.

greptile-apps · 2026-05-17T07:52:53Z

Greptile Summary

This PR fixes non-deterministic multipart bodies in VCR cassettes for gpt-image-1 image-edit tests, preventing ~$150/day in live OpenAI spend on every CI run. It addresses three root causes — async httpx multipart boundary randomness, BytesIO EOF-on-retry, and vcrpy's iterator-vs-iterator matcher bug — and adds structured per-PID diagnostic logging across all 13 VCR-using test suites.

Body materialization + boundary pinning: _materialize_iterable_body drains list_iterator/bytes_iterator bodies to contiguous bytes before matching or recording, and clears vcrpy's sticky _was_iter/_was_file flags so the body getter no longer re-wraps the stored bytes; pin_httpx_multipart_boundary monkeypatches httpx._multipart.MultipartStream.__init__ to use a static boundary, making multipart payloads byte-identical across sync and async runs.
BytesIO → bytes in fixtures: _make_test_images and _make_single_test_image now return raw bytes instead of BytesIO, eliminating the silent empty-body encode on SDK-level retries; a dedicated get_test_images_as_bytesio() factory preserves BytesIO-specific coverage.
Diagnostic log infrastructure: Per-PID log files under test-results/vcr-diagnostics/ capture body-mismatch details (SHA-256, lengths, first divergent offset, ±100-byte windows) and are emitted at session end via emit_vcr_diagnostic_log wired into all 13 VCR conftests.

Confidence Score: 5/5

Safe to merge — all changes are confined to test infrastructure with no production code touched.

Every changed file is a test fixture, conftest, or VCR helper. The monkeypatch to httpx._multipart.MultipartStream.__init__ is session-scoped and undone after the session, the vcrpy internal-flag clearing (_was_iter/_was_file) is wrapped in try/except, and the BytesIO to bytes switch is a fidelity improvement. The diagnostic log system writes to per-PID files under test-results/ and is entirely passive. No cassette matching logic is weakened — the body comparator is strictly more correct than before.

No files require special attention.

Important Files Changed

Filename	Overview
tests/_vcr_conftest_common.py	Core VCR infrastructure: adds `_materialize_iterable_body` to coalesce async iterator bodies before matching/recording, `pin_httpx_multipart_boundary` to monkeypatch httpx, and a structured per-PID diagnostic log system; logic is sound and edge cases are guarded.
tests/_vcr_redis_persister.py	Adds `_log_episode_body_hashes` called before the episode-count guard to emit per-episode SHA-256s to the diagnostic log; straightforward addition with correct type handling.
tests/image_gen_tests/conftest.py	Adds a session-scoped autouse fixture that correctly uses `pytest.MonkeyPatch()` (not the function-scoped fixture) to pin the multipart boundary for the entire session; wires in diagnostic helpers.
tests/image_gen_tests/test_image_edits.py	Replaces `BytesIO` returns from `_make_test_images`/`_make_single_test_image` with raw `bytes` to prevent EOF-on-retry; preserves a separate `get_test_images_as_bytesio()` for BytesIO-specific coverage.
tests/audio_tests/conftest.py	Uniform 4-line addition: imports and wires `reset_vcr_diag_dir` / `emit_vcr_diagnostic_log` into the existing configure/terminal-summary hooks; matches the pattern applied to all 13 VCR-using conftests.

_{Reviews (5): Last reviewed commit: "fix(tests): gate body materialization on..." | Re-trigger Greptile}

…verywhere * Remove ``scripts/flush_image_edit_vcr_cassettes.py``. It was a one-shot helper for the initial cassette flush; the iterator and ``_was_iter`` fixes mean no future flush should be required, and the script was never run anywhere (the actual flushes happened inside the CI conftest via the temp hacks that have since been reverted). * The matcher mismatch + normalizer skip diagnostics already write per-PID files for every suite that imports the shared VCR plumbing, but ``emit_vcr_diagnostic_log`` -- the controller-side dump that surfaces those files into the CI log at session end -- was only wired into ``image_gen_tests``. Add the one-line call to the 12 sibling conftests that already use VCR so the diagnostics surface in any suite's terminal output if a body matcher ever misses. No new output in steady state -- the dump is a no-op when no diagnostics were recorded that session.

Strips docstrings, inline comments, and block comments that this PR introduced where the code itself was already self-evident. Keeps the few lines that document non-obvious behaviour (raw-bytes-not-BytesIO rationale on the image fixtures, the per-PID-files-bypass-xdist note on the diagnostic directory). Touches only comments this PR added -- no pre-existing comment is removed. Net: -161 lines of comment/docstring across 3 files, no code behaviour change.

Defensive against future httpx MultipartStream.__init__ adding new optional kwargs. Without the forward, the wrapper would silently drop them. No behaviour change today.

mateo-berri · 2026-05-17T08:06:10Z

@greptileai

…branches Bundles the "follow-up cleanup PR" into this one so it does not get lost. Four small changes: 1. Introduce ``_canonical_body(req) -> (bytes, pre_type)`` and route ``_safe_body_matcher`` through it. The matcher now operates on bytes by construction; the "compare two iterator objects via ``==`` and silently get object-identity semantics" failure mode (which cost us this entire PR to diagnose) is structurally impossible to reintroduce. ``pre_type`` is the body type *before* canonicalization, surfaced by the mismatch diagnostic so a future regression involving a new body shape is still visible. 2. Add a structured diagnostic to ``_key_fingerprint_matcher``. It was previously raising a bare ``AssertionError("API key fingerprints differ")`` with zero context -- exactly the anti-pattern the body matcher had before this PR. 3. Surface "shouldn't-happen" branches via ``vcr_diag_write_line``: * ``_strip_image_b64_payloads`` -- logs when ``response``, ``response['body']``, or ``response['body']['string']`` arrives in an unexpected shape (vcrpy contract violation). * ``_compute_key_fingerprint`` -- logs the ``"no-key"`` fallback with the request method/URL so a stripped-auth-header bug is visible instead of masked. * ``_canonical_body`` -- logs its own empty-bytes fallback when a body has a shape ``_materialize_iterable_body`` did not handle. 4. Re-introduce per-episode body-hash logging in ``_RedisPersister.save_cassette`` (was reverted in 927c554 as "noisy"). Quantified cost: ~25 KB of CI log per session at peak, ~ms-scale CPU, zero output in steady state (no save = no log). Trade-off favours keeping it: lets two consecutive CI runs be diffed by body hash, which is how we will spot the next regression in the same class. All call sites still work: local repro confirms iter==iter HIT, iter!=iter raises, plain-bytes HIT, body-hash log emits via the same per-PID file plumbing as the matcher diagnostics.

mateo-berri · 2026-05-17T08:18:50Z

@greptileai

…test ``image_gen_tests/conftest.py`` was the only suite that cleared ``test-results/vcr-diagnostics/*.log`` at session start. The other 12 VCR-using conftests inherited any stale per-PID logs from a previous local run and would dump them in the terminal summary -- harmless in CI (fresh container) but confusing locally when running multiple suites in sequence. Extracts the cleanup into a ``reset_vcr_diag_dir`` helper in ``tests/_vcr_conftest_common.py`` and calls it from every VCR-using conftest's ``pytest_configure``. Same single source of truth, no inline duplication.

mateo-berri · 2026-05-17T08:28:42Z

@greptileai

aiohttp/vcrpy stores the json kwarg as a dict; _materialize_iterable_body was iterating it via __iter__ and joining the keys, replacing the request body with concatenated key names ("textlanguageentities"). Gate on __next__ so containers (dict/list/tuple) are left alone — only single-use iterators like httpx's bytes_iterator / list_iterator are materialized. Log diagnostic line when chunk type is unrecognized.

mateo-berri · 2026-05-17T08:58:13Z

@greptileai

cursor

Cursor Bugbot has reviewed your changes using high mode and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8021e60. Configure here.}

cursor · 2026-05-17T09:17:38Z

+            flush=True,
+        )
+    except Exception as exc:
+        print(f"[flush-hack] failed: {exc}", flush=True)


Temporary Redis flush hack left in production code

High Severity

The _flush_corrupted_presidio_cassettes function, a temporary "flush-hack," was left in pytest_configure. It unconditionally deletes Presidio PII test cassettes from Redis on every guardrails test run. This forces those tests to hit live APIs and re-record, undermining the goal of reducing API spend with VCR.

^{Reviewed by Cursor Bugbot for commit 8021e60. Configure here.}

shin-berri and others added 3 commits May 13, 2026 22:37

Merge pull request #27906 from BerriAI/litellm_internal_staging

e58a561

[Infra] Promote internal staging to main

Merge pull request #28100 from BerriAI/litellm_internal_staging

a72414a

[Infra] Promote internal staging to main

cursor Bot reviewed May 17, 2026

View reviewed changes

Comment thread tests/image_gen_tests/conftest.py Outdated

mateo-berri changed the base branch from main to litellm_internal_staging May 17, 2026 06:19

mateo-berri force-pushed the litellm_stabilize_image_edit_vcr_cassettes branch from 3a503f9 to 4254c4a Compare May 17, 2026 06:25

cursor Bot reviewed May 17, 2026

View reviewed changes

Comment thread tests/_vcr_redis_persister.py

mateo-berri force-pushed the litellm_stabilize_image_edit_vcr_cassettes branch from 1131430 to 85430bc Compare May 17, 2026 07:09

cursor Bot reviewed May 17, 2026

View reviewed changes

Comment thread tests/_vcr_redis_persister.py Outdated

mateo-berri added 2 commits May 17, 2026 07:19

mateo-berri force-pushed the litellm_stabilize_image_edit_vcr_cassettes branch from b5ff351 to 9e2e5b6 Compare May 17, 2026 07:35

mateo-berri added 2 commits May 17, 2026 07:44

mateo-berri marked this pull request as ready for review May 17, 2026 07:49

greptile-apps Bot reviewed May 17, 2026

View reviewed changes

Comment thread tests/_vcr_conftest_common.py Outdated

Comment thread tests/image_gen_tests/test_image_edits.py

mateo-berri added 3 commits May 17, 2026 08:00

chore(tests): forward **kwargs in pin_httpx_multipart_boundary wrapper

a3c36f0

Defensive against future httpx MultipartStream.__init__ adding new optional kwargs. Without the forward, the wrapper would silently drop them. No behaviour change today.

greptile-apps Bot reviewed May 17, 2026

View reviewed changes

Comment thread tests/image_gen_tests/test_image_edits.py

HACK: one-shot redis flush of corrupted presidio cassettes

8021e60

cursor Bot reviewed May 17, 2026

View reviewed changes

Uh oh!

Conversation

mateo-berri commented May 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this changes

Why this is still a faithful end-to-end test

Test plan

Out of scope

Uh oh!

codspeed-hq Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

codecov Bot commented May 17, 2026

Codecov Report

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CLAassistant commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

mateo-berri commented May 17, 2026

Uh oh!

Uh oh!

mateo-berri commented May 17, 2026

Uh oh!

mateo-berri commented May 17, 2026

Uh oh!

mateo-berri commented May 17, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 17, 2026

Choose a reason for hiding this comment

Temporary Redis flush hack left in production code

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mateo-berri commented May 17, 2026 •

edited by cursor Bot

Loading

codspeed-hq Bot commented May 17, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

CLAassistant commented May 17, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

greptile-apps Bot commented May 17, 2026 •

edited

Loading