Skip to content

Commit 4254c4a

Browse files
committed
fix(tests): stabilize image-edit VCR cassettes to stop live gpt-image-1 spend
The image-edit cassettes for ``gpt-image-1`` were accumulating >50 episodes and being refused by the persister (``tests/_vcr_redis_persister.py``), so every CI run was hitting the real OpenAI endpoint. The async parametrize was the clearest tell: ``test_openai_image_edit_litellm_sdk[True]`` cached to 1 entry, but the ``[False]`` (async) sibling grew to 51 entries and never replayed. Two non-deterministic sources were fueling the growth, both fixed here. After this patch, the cassettes settle at one episode per unique call and replay for the 24-hour TTL like every other suite. 1. Pin httpx's multipart boundary at the source. The existing ``_normalize_multipart_boundary`` rewrites the boundary in the ``Content-Type`` header reliably, but on the async transport path the body is not always a contiguous ``bytes`` object when ``before_record_request`` runs, so the body-side replacement silently no-ops and the recorded cassette retains the random ``boundary=<hex>`` string. The next CI run gets a fresh random boundary, the ``safe_body`` matcher misses, and ``record_mode="new_episodes"`` appends another episode. Wrapping ``httpx._multipart.MultipartStream.__init__`` so it always uses ``vcr-static-boundary`` when no boundary is supplied eliminates the variance for both sync and async paths and leaves the normalizer in place as a backstop. Exposed as ``pin_httpx_multipart_boundary`` so other multipart-heavy suites (audio, ocr, batches) can adopt the same fixture later. 2. Pass raw ``bytes`` (not ``BytesIO`` streams) through the image-edit fixtures. A ``BytesIO`` whose file pointer is at EOF after the first multipart upload silently encodes an empty image on the next SDK / Router retry — yet another divergent body that VCR records as a new episode. ``bytes`` are immutable and position-less, so retries re-encode an identical payload every time. This is also a small production-correctness improvement: a customer passing ``BytesIO`` today would hit the same empty-body retry bug. The BytesIO-specific smoke test (``test_openai_image_edit_with_bytesio``) is preserved by giving ``get_test_images_as_bytesio`` its own factory instead of aliasing the bytes one. 3. Add ``scripts/flush_image_edit_vcr_cassettes.py`` — a one-shot Redis SCAN/DEL helper that clears the bloated pre-fix cassettes under ``litellm:vcr:cassette:tests/image_gen_tests/test_image_edits/*``. Without this, the next CI run still loads the existing 51-entry cassette, the new fixed-boundary body still doesn't match any of the stale entries, the persister still refuses to save, and the bleed continues. Run once with the production ``CASSETTE_REDIS_URL`` after merge (dry-run by default).
1 parent a72414a commit 4254c4a

4 files changed

Lines changed: 232 additions & 26 deletions

File tree

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
#!/usr/bin/env python3
2+
"""Flush the bloated image-edit VCR cassettes from the cassette Redis.
3+
4+
Run this **once** after merging the multipart-boundary stabilization
5+
PR. The pre-fix cassettes for the async image-edit tests have
6+
accumulated >50 episodes (random multipart boundary on every run +
7+
``record_mode="new_episodes"`` = monotonic growth), so the persister
8+
refuses to save updates -- meaning every CI run after the fix would
9+
still try to re-record against the stale 51-entry cassette, hit
10+
``MAX_EPISODES_PER_CASSETTE`` again, get refused, and re-bill the live
11+
provider.
12+
13+
Deleting these keys forces the next CI run to record a clean cassette
14+
under the new fixed-boundary + raw-bytes fixtures (1 episode per
15+
unique call), after which the 24-hour TTL replay loop kicks in
16+
normally.
17+
18+
Scope is intentionally narrow:
19+
* Only ``tests/image_gen_tests/test_image_edits/*`` cassette keys
20+
are touched. Image-*generation* cassettes (TestOpenAIGPTImage1
21+
etc.) are unaffected -- they were already in the VCR HIT state.
22+
* Lists every match in dry-run mode before deleting anything so the
23+
operator can confirm the impact.
24+
25+
Usage:
26+
CASSETTE_REDIS_URL=redis://... \
27+
uv run python scripts/flush_image_edit_vcr_cassettes.py --dry-run
28+
29+
CASSETTE_REDIS_URL=redis://... \
30+
uv run python scripts/flush_image_edit_vcr_cassettes.py --yes
31+
32+
``CASSETTE_REDIS_URL`` is the same env var the persister reads at CI
33+
start (see ``tests/_vcr_redis_persister.py``).
34+
"""
35+
36+
from __future__ import annotations
37+
38+
import argparse
39+
import os
40+
import sys
41+
42+
import redis
43+
44+
45+
CASSETTE_REDIS_URL_ENV = "CASSETTE_REDIS_URL"
46+
REDIS_KEY_PREFIX = "litellm:vcr:cassette:"
47+
TARGET_KEY_PATTERN = f"{REDIS_KEY_PREFIX}tests/image_gen_tests/test_image_edits/*"
48+
49+
50+
def _build_client(url: str) -> redis.Redis:
51+
return redis.Redis.from_url(
52+
url,
53+
socket_timeout=10,
54+
socket_connect_timeout=10,
55+
decode_responses=False,
56+
)
57+
58+
59+
def _scan_matching_keys(client: redis.Redis, pattern: str) -> list[bytes]:
60+
return sorted(client.scan_iter(match=pattern, count=500))
61+
62+
63+
def _delete_keys(client: redis.Redis, keys: list[bytes]) -> int:
64+
if not keys:
65+
return 0
66+
# Batch into chunks so a single DEL call does not exceed the
67+
# server's argument-count or buffer limits on large key sets.
68+
deleted = 0
69+
chunk_size = 200
70+
for start in range(0, len(keys), chunk_size):
71+
batch = keys[start : start + chunk_size]
72+
deleted += int(client.delete(*batch))
73+
return deleted
74+
75+
76+
def main(argv: list[str] | None = None) -> int:
77+
parser = argparse.ArgumentParser(description=__doc__)
78+
parser.add_argument(
79+
"--yes",
80+
action="store_true",
81+
help="Actually delete the matched keys. Without this flag the script "
82+
"runs in dry-run mode and only lists what would be deleted.",
83+
)
84+
parser.add_argument(
85+
"--dry-run",
86+
action="store_true",
87+
help="List matched keys without deleting (default behaviour when "
88+
"--yes is omitted; kept as an explicit flag for clarity).",
89+
)
90+
parser.add_argument(
91+
"--pattern",
92+
default=TARGET_KEY_PATTERN,
93+
help=f"Override the SCAN match pattern. Default: {TARGET_KEY_PATTERN}",
94+
)
95+
args = parser.parse_args(argv)
96+
97+
url = os.environ.get(CASSETTE_REDIS_URL_ENV)
98+
if not url:
99+
print(
100+
f"error: {CASSETTE_REDIS_URL_ENV} is not set. Set it to the "
101+
"cassette Redis URL (same URL the persister reads in CI).",
102+
file=sys.stderr,
103+
)
104+
return 2
105+
106+
client = _build_client(url)
107+
try:
108+
client.ping()
109+
except redis.RedisError as exc:
110+
print(f"error: cannot reach cassette Redis: {exc}", file=sys.stderr)
111+
return 2
112+
113+
matches = _scan_matching_keys(client, args.pattern)
114+
print(f"matched {len(matches)} key(s) under pattern: {args.pattern}")
115+
for key in matches:
116+
print(f" {key.decode('utf-8', errors='replace')}")
117+
118+
if not matches:
119+
return 0
120+
121+
if not args.yes:
122+
print("\ndry run -- pass --yes to actually delete these keys.")
123+
return 0
124+
125+
deleted = _delete_keys(client, matches)
126+
print(f"\ndeleted {deleted} key(s).")
127+
return 0
128+
129+
130+
if __name__ == "__main__":
131+
raise SystemExit(main())

tests/_vcr_conftest_common.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,55 @@
9191
VCR_FIXED_MULTIPART_BOUNDARY = "vcr-static-boundary"
9292

9393

94+
def pin_httpx_multipart_boundary(monkeypatch) -> None:
95+
"""Force every httpx multipart request to use a constant boundary.
96+
97+
httpx's ``MultipartStream`` generates a fresh ``boundary=<random hex>``
98+
via ``os.urandom(16)`` whenever the caller does not supply one
99+
(see ``httpx._multipart.MultipartStream.__init__``). That random
100+
boundary appears both in the ``Content-Type`` header and verbatim in
101+
the request body between each part.
102+
103+
``_normalize_multipart_boundary`` rewrites the header reliably, but
104+
on the async transport path the body is not always handed to
105+
``before_record_request`` as a contiguous ``bytes`` object — so the
106+
body replacement silently no-ops and the recorded cassette retains
107+
the random boundary string. Subsequent runs generate a *different*
108+
random boundary, the ``safe_body`` matcher misses, and
109+
``record_mode="new_episodes"`` appends a fresh episode until the
110+
cassette crosses ``MAX_EPISODES_PER_CASSETTE`` and the persister
111+
refuses to save — re-billing live providers on every CI run.
112+
113+
Pinning the boundary at the source removes the variance entirely:
114+
every run emits byte-identical multipart bodies, the existing
115+
``safe_body`` matcher succeeds on the first request, and one
116+
recorded episode per unique call satisfies replays for the cassette
117+
TTL.
118+
119+
This wraps ``MultipartStream.__init__`` instead of patching the
120+
boundary-generation helper directly because httpx inlines
121+
``os.urandom(16).hex().encode("ascii")`` in the constructor body
122+
rather than calling a named function. We preserve the caller's
123+
boundary when one is explicitly supplied so production-style code
124+
that pins its own boundary keeps working.
125+
"""
126+
try:
127+
import httpx._multipart as _httpx_multipart
128+
except ImportError: # pragma: no cover - httpx is a hard test dep
129+
return
130+
131+
_original_init = _httpx_multipart.MultipartStream.__init__
132+
133+
def _init_with_fixed_boundary(self, data, files, boundary=None):
134+
if boundary is None:
135+
boundary = VCR_FIXED_MULTIPART_BOUNDARY.encode("ascii")
136+
return _original_init(self, data=data, files=files, boundary=boundary)
137+
138+
monkeypatch.setattr(
139+
_httpx_multipart.MultipartStream, "__init__", _init_with_fixed_boundary
140+
)
141+
142+
94143
def _scrub_response(response):
95144
if not isinstance(response, dict):
96145
return response

tests/image_gen_tests/conftest.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
emit_cassette_cache_session_banner,
1616
emit_vcr_classification_summary,
1717
install_live_call_probe,
18+
pin_httpx_multipart_boundary,
1819
record_vcr_outcome,
1920
register_persister_if_enabled,
2021
vcr_config_dict,
@@ -33,6 +34,23 @@ def event_loop():
3334
loop.close()
3435

3536

37+
@pytest.fixture(scope="session", autouse=True)
38+
def _pin_multipart_boundary():
39+
"""Pin httpx's random multipart boundary to a constant for the
40+
entire image-gen test session. Without this, async multipart bodies
41+
contain a fresh ``boundary=<random hex>`` on every run; the
42+
``safe_body`` matcher misses, and ``record_mode="new_episodes"``
43+
grows each cassette by one entry per run until it crosses the
44+
50-episode persister cap and stops being saved — leaving the test
45+
to hit the real provider on every CI run. See
46+
``pin_httpx_multipart_boundary`` for the full rationale.
47+
"""
48+
monkeypatch = pytest.MonkeyPatch()
49+
pin_httpx_multipart_boundary(monkeypatch)
50+
yield
51+
monkeypatch.undo()
52+
53+
3654
@pytest.fixture(scope="module")
3755
def vcr_config():
3856
return vcr_config_dict()

tests/image_gen_tests/test_image_edits.py

Lines changed: 34 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -103,12 +103,16 @@ async def test_openai_image_edit_litellm_sdk(self, sync_mode):
103103
pwd = os.path.dirname(os.path.realpath(__file__))
104104

105105

106-
# Image fixtures must be regenerated per access — module-level
107-
# ``open(...)`` handles get consumed after a single multipart upload, leaving
108-
# subsequent tests in the same process to send empty bodies. That non-determinism
109-
# (a) blows the recorded cassette past ``MAX_EPISODES_PER_CASSETTE`` so the
110-
# persister refuses to save (see ``tests/_vcr_redis_persister.py``), and
111-
# (b) re-bills the live image edit endpoint on every CI run.
106+
# Image fixtures are returned as raw ``bytes`` (not file handles or
107+
# ``BytesIO`` streams) so that every SDK / Router retry sees the same
108+
# payload. A ``BytesIO`` whose file pointer is left at EOF by the first
109+
# multipart upload silently encodes an empty image on the second
110+
# attempt, producing a different request body — VCR records that
111+
# divergent body as a fresh episode, the cassette eventually crosses
112+
# ``MAX_EPISODES_PER_CASSETTE`` in ``tests/_vcr_redis_persister.py``,
113+
# the persister refuses to save, and every subsequent CI run re-bills
114+
# the live image-edit endpoint. Raw bytes are immutable, position-less,
115+
# and re-encoded identically on every retry attempt.
112116
def _read_image_bytes(filename: str) -> bytes:
113117
with open(os.path.join(pwd, filename), "rb") as f:
114118
return f.read()
@@ -119,30 +123,34 @@ def _read_image_bytes(filename: str) -> bytes:
119123

120124

121125
def _make_test_images() -> list:
122-
"""Return a fresh pair of image streams seeded with the fixture bytes.
123-
124-
Use this everywhere you'd previously have used the module-level
125-
``TEST_IMAGES``. Each call returns brand new ``BytesIO`` objects whose
126-
file pointers start at 0, so multipart uploads encode the full image
127-
bytes on every test invocation. Parametrized and ``flaky``-retried
128-
test methods call ``get_base_image_edit_call_args`` once per
129-
invocation, so a fresh stream per call is sufficient — the factory
130-
must not auto-rewind on EOF or the SDK's multipart writer will read
131-
the same bytes forever (worker OOM).
126+
"""Return the pair of fixture images as raw ``bytes`` payloads.
127+
128+
``httpx`` accepts a ``bytes`` value anywhere a file-like upload is
129+
expected and re-encodes it identically on every multipart attempt
130+
— so SDK-level retries can never produce a divergent empty-body
131+
episode (the root cause of the cassette-overflow leak that bills
132+
``gpt-image-1`` on every CI run).
132133
"""
133-
return [
134-
BytesIO(_ISHAAN_GITHUB_BYTES),
135-
BytesIO(_LITELLM_SITE_BYTES),
136-
]
134+
return [_ISHAAN_GITHUB_BYTES, _LITELLM_SITE_BYTES]
137135

138136

139-
def _make_single_test_image() -> BytesIO:
140-
return BytesIO(_ISHAAN_GITHUB_BYTES)
137+
def _make_single_test_image() -> bytes:
138+
return _ISHAAN_GITHUB_BYTES
141139

142140

143141
def get_test_images_as_bytesio():
144-
"""Helper function to get test images as BytesIO objects"""
145-
return _make_test_images()
142+
"""Return the fixture images as fresh ``BytesIO`` streams.
143+
144+
Kept distinct from ``_make_test_images`` so the BytesIO-specific
145+
smoke tests (``test_openai_image_edit_with_bytesio``,
146+
``test_multiple_image_edit_with_different_formats``) still exercise
147+
the file-like upload path. Each call yields brand new streams so
148+
the file pointer always starts at 0 for that test invocation.
149+
"""
150+
return [
151+
BytesIO(_ISHAAN_GITHUB_BYTES),
152+
BytesIO(_LITELLM_SITE_BYTES),
153+
]
146154

147155

148156
class TestOpenAIImageEditGPTImage1(BaseLLMImageEditTest):
@@ -710,9 +718,9 @@ async def test_multiple_image_edit_with_different_formats():
710718
try:
711719
prompt = "Create a cohesive artistic style across all images"
712720

713-
# Test with mixed BytesIO and file objects
721+
# Test with mixed raw-bytes and BytesIO inputs
714722
mixed_images = [
715-
_make_single_test_image(), # File object
723+
_make_single_test_image(), # raw ``bytes`` payload
716724
get_test_images_as_bytesio()[1], # BytesIO object
717725
]
718726

0 commit comments

Comments
 (0)