fix: improve responses compaction chunking#119
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves the proxy’s /v1/responses/compact fallback behavior by making chunked compaction more efficient under load (deduping identical in-flight requests), enabling parallel sibling-chunk compaction with a configurable concurrency limit, and making the shared compaction-attempt budget safe to use across parallel chunk fanout.
Changes:
- Added in-flight deduplication for identical
/v1/responses/compactrequests to avoid redundant upstream compaction work. - Introduced configurable sibling-chunk concurrency for 413 fallback chunk fanout, plus budget locking/snapshots to keep attempt accounting thread-safe.
- Enabled proactive chunking when a smaller compact chunk size is explicitly configured, and wired the new concurrency setting through server options and CLI flags.
Show a summary per file
| File | Description |
|---|---|
server/server.go |
Exposes a new server option to configure compact chunk concurrency via proxy options. |
proxy/responses_handler.go |
Implements in-flight request dedupe, parallel chunk fanout with cancellation, and makes compaction budget safe for concurrent use. |
proxy/handler.go |
Adds handler configuration fields and effective defaulting for proactive chunking and chunk concurrency. |
main.go |
Adds CLI/env configuration for compact chunk concurrency and plumbs it into server startup. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 4
| result := compactInflightResult{summary: summary, err: err} | ||
| if resp != nil { | ||
| respBody, readErr := io.ReadAll(resp.Body) | ||
| _ = resp.Body.Close() | ||
| if readErr != nil { | ||
| if err == nil { |
| raw, err := json.Marshal(payload) | ||
| if err != nil { | ||
| return "", false | ||
| } | ||
| sum := sha256.Sum256(raw) | ||
| return hex.EncodeToString(sum[:]), true |
| func (h *ProxyHandler) compactResponsesRequest(ctx context.Context, requestFields map[string]json.RawMessage, extraHeaders http.Header) (string, *http.Response, error) { | ||
| key, ok := compactInflightKey(requestFields, extraHeaders) | ||
| if !ok { | ||
| budget := newCompactBudget(h.effectiveCompactMaxAttempts()) | ||
| return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget) | ||
| } | ||
|
|
||
| call, leader := h.beginCompactInflight(key) | ||
| if !leader { | ||
| return waitCompactInflight(ctx, call) | ||
| } |
| if len(chunks) > 1 { | ||
| concurrency := h.effectiveCompactChunkConcurrency() | ||
| remaining := len(chunks) - 1 | ||
| if concurrency > remaining { | ||
| concurrency = remaining | ||
| } | ||
| if concurrency < 1 { | ||
| concurrency = 1 | ||
| } | ||
|
|
||
| fanoutCtx, cancelFanout := context.WithCancel(ctx) | ||
| defer cancelFanout() | ||
|
|
||
| jobs := make(chan int) | ||
| errCh := make(chan error, 1) | ||
| var wg sync.WaitGroup | ||
| for worker := 0; worker < concurrency; worker++ { | ||
| wg.Add(1) | ||
| go func() { | ||
| defer wg.Done() |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee7aa7e09e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if fanoutCtx.Err() != nil { | ||
| return | ||
| } | ||
| summary, chunkErr := processChunk(fanoutCtx, i) |
There was a problem hiding this comment.
Re-check learned target before each fanout chunk
When the first chunk fits but a later sibling is the first one to receive a 413, that worker records a smaller learned target, but the rest of the fanout continues to call processChunk with the original precomputed chunk layout. For mixed-size histories this makes every remaining over-cap sibling pay its own discovery 413 (and recursive sub-fanout) instead of re-splitting after the first later failure as the old sequential loop did, which can exhaust compact-upstream-max-attempts and return the original 413 for large but otherwise compactable requests. Consider checking budget.learnedTargetValue() before dispatching/processing each sibling or stopping the fanout to re-split the unsent tail once any worker learns a smaller target.
Useful? React with 👍 / 👎.
| if resp != nil { | ||
| respBody, readErr := io.ReadAll(resp.Body) | ||
| _ = resp.Body.Close() | ||
| if readErr != nil { |
| | `--responses-ws-auto-compact-max-items` | `RESPONSES_WS_AUTO_COMPACT_MAX_ITEMS` | `8` | Auto-compact when history item count exceeds this threshold | | ||
| | `--responses-ws-auto-compact-max-bytes` | `RESPONSES_WS_AUTO_COMPACT_MAX_BYTES` | `32768` | Auto-compact when raw history byte size exceeds this threshold | | ||
| | `--responses-ws-auto-compact-keep-tail` | `RESPONSES_WS_AUTO_COMPACT_KEEP_TAIL` | `4` | Keep this many most recent items verbatim after websocket auto-compaction and as the starting tail for replay-like `POST /v1/responses` 413 compaction; the replay fallback aligns tool/call boundaries and may halve this per request if compacted retries still return 413 | | ||
| | `--compact-upstream-chunk-bytes` | `COMPACT_UPSTREAM_CHUNK_BYTES` | `4194304` | Initial target body size for chunked compaction retries after an upstream `413` (`/v1/responses/compact` and `POST /v1/responses` replay compaction). Default sized below the empirically observed Copilot `/responses` payload cap of `5 MiB` (`5,242,880` bytes); `4 MiB` leaves headroom for proxy-owned overhead. Halved on each recursive `413` (and eagerly halved when a sub-target body still 413s) down to a `64 KiB` floor. Learned smaller targets are cached in memory for 30 minutes per provider/model/endpoint so later compaction fallbacks skip known-doomed larger chunks; tune up for providers like Azure that accept larger bodies, or down if your upstream's payload cap is below `4 MiB` | |
| func (h *ProxyHandler) compactResponsesRequest(ctx context.Context, requestFields map[string]json.RawMessage, extraHeaders http.Header) (string, *http.Response, error) { | ||
| key, ok := compactInflightKey(requestFields, extraHeaders) | ||
| if !ok { | ||
| budget := newCompactBudget(h.effectiveCompactMaxAttempts()) | ||
| return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget) | ||
| } | ||
|
|
||
| call, leader := h.beginCompactInflight(key) | ||
| if !leader { | ||
| return waitCompactInflight(ctx, call) | ||
| } | ||
|
|
||
| budget := newCompactBudget(h.effectiveCompactMaxAttempts()) | ||
| return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget) | ||
| summary, resp, err := h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget) | ||
| result := compactInflightResult{summary: summary, err: err} | ||
| if resp != nil { | ||
| respBody, readErr := io.ReadAll(resp.Body) | ||
| _ = resp.Body.Close() | ||
| if readErr != nil { | ||
| if err == nil { | ||
| err = readErr | ||
| } | ||
| result.err = err | ||
| resp = nil | ||
| } else { | ||
| result.resp = cloneHTTPResponseWithBody(resp, respBody) | ||
| result.respBody = respBody | ||
| resp = cloneHTTPResponseWithBody(resp, respBody) | ||
| } | ||
| } | ||
| h.finishCompactInflight(key, call, result) | ||
| return summary, resp, err |
Summary
Tests