Skip to content

fix: improve responses compaction chunking#119

Open
sozercan wants to merge 3 commits into
mainfrom
fix-from-logs
Open

fix: improve responses compaction chunking#119
sozercan wants to merge 3 commits into
mainfrom
fix-from-logs

Conversation

@sozercan
Copy link
Copy Markdown
Owner

Summary

  • dedupe identical in-flight compact requests
  • add configurable sibling chunk concurrency for compact fallback
  • make compact attempt budget safe for parallel chunk fanout
  • proactively chunk when a smaller compact chunk size is explicitly configured

Tests

  • go test ./... -count=1

Copilot AI review requested due to automatic review settings May 16, 2026 16:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the proxy’s /v1/responses/compact fallback behavior by making chunked compaction more efficient under load (deduping identical in-flight requests), enabling parallel sibling-chunk compaction with a configurable concurrency limit, and making the shared compaction-attempt budget safe to use across parallel chunk fanout.

Changes:

  • Added in-flight deduplication for identical /v1/responses/compact requests to avoid redundant upstream compaction work.
  • Introduced configurable sibling-chunk concurrency for 413 fallback chunk fanout, plus budget locking/snapshots to keep attempt accounting thread-safe.
  • Enabled proactive chunking when a smaller compact chunk size is explicitly configured, and wired the new concurrency setting through server options and CLI flags.
Show a summary per file
File Description
server/server.go Exposes a new server option to configure compact chunk concurrency via proxy options.
proxy/responses_handler.go Implements in-flight request dedupe, parallel chunk fanout with cancellation, and makes compaction budget safe for concurrent use.
proxy/handler.go Adds handler configuration fields and effective defaulting for proactive chunking and chunk concurrency.
main.go Adds CLI/env configuration for compact chunk concurrency and plumbs it into server startup.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 4

Comment on lines +678 to +683
result := compactInflightResult{summary: summary, err: err}
if resp != nil {
respBody, readErr := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if readErr != nil {
if err == nil {
Comment on lines +609 to +614
raw, err := json.Marshal(payload)
if err != nil {
return "", false
}
sum := sha256.Sum256(raw)
return hex.EncodeToString(sum[:]), true
Comment on lines 664 to +674
func (h *ProxyHandler) compactResponsesRequest(ctx context.Context, requestFields map[string]json.RawMessage, extraHeaders http.Header) (string, *http.Response, error) {
key, ok := compactInflightKey(requestFields, extraHeaders)
if !ok {
budget := newCompactBudget(h.effectiveCompactMaxAttempts())
return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
}

call, leader := h.beginCompactInflight(key)
if !leader {
return waitCompactInflight(ctx, call)
}
Comment on lines +1056 to +1075
if len(chunks) > 1 {
concurrency := h.effectiveCompactChunkConcurrency()
remaining := len(chunks) - 1
if concurrency > remaining {
concurrency = remaining
}
if concurrency < 1 {
concurrency = 1
}

fanoutCtx, cancelFanout := context.WithCancel(ctx)
defer cancelFanout()

jobs := make(chan int)
errCh := make(chan error, 1)
var wg sync.WaitGroup
for worker := 0; worker < concurrency; worker++ {
wg.Add(1)
go func() {
defer wg.Done()
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee7aa7e09e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if fanoutCtx.Err() != nil {
return
}
summary, chunkErr := processChunk(fanoutCtx, i)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-check learned target before each fanout chunk

When the first chunk fits but a later sibling is the first one to receive a 413, that worker records a smaller learned target, but the rest of the fanout continues to call processChunk with the original precomputed chunk layout. For mixed-size histories this makes every remaining over-cap sibling pay its own discovery 413 (and recursive sub-fanout) instead of re-splitting after the first later failure as the old sequential loop did, which can exhaust compact-upstream-max-attempts and return the original 413 for large but otherwise compactable requests. Consider checking budget.learnedTargetValue() before dispatching/processing each sibling or stopping the fanout to re-split the unsent tail once any worker learns a smaller target.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings May 17, 2026 04:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 8/8 changed files
  • Comments generated: 3

Comment on lines +681 to +684
if resp != nil {
respBody, readErr := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if readErr != nil {
Comment thread docs/configuration.md
| `--responses-ws-auto-compact-max-items` | `RESPONSES_WS_AUTO_COMPACT_MAX_ITEMS` | `8` | Auto-compact when history item count exceeds this threshold |
| `--responses-ws-auto-compact-max-bytes` | `RESPONSES_WS_AUTO_COMPACT_MAX_BYTES` | `32768` | Auto-compact when raw history byte size exceeds this threshold |
| `--responses-ws-auto-compact-keep-tail` | `RESPONSES_WS_AUTO_COMPACT_KEEP_TAIL` | `4` | Keep this many most recent items verbatim after websocket auto-compaction and as the starting tail for replay-like `POST /v1/responses` 413 compaction; the replay fallback aligns tool/call boundaries and may halve this per request if compacted retries still return 413 |
| `--compact-upstream-chunk-bytes` | `COMPACT_UPSTREAM_CHUNK_BYTES` | `4194304` | Initial target body size for chunked compaction retries after an upstream `413` (`/v1/responses/compact` and `POST /v1/responses` replay compaction). Default sized below the empirically observed Copilot `/responses` payload cap of `5 MiB` (`5,242,880` bytes); `4 MiB` leaves headroom for proxy-owned overhead. Halved on each recursive `413` (and eagerly halved when a sub-target body still 413s) down to a `64 KiB` floor. Learned smaller targets are cached in memory for 30 minutes per provider/model/endpoint so later compaction fallbacks skip known-doomed larger chunks; tune up for providers like Azure that accept larger bodies, or down if your upstream's payload cap is below `4 MiB` |
Comment on lines 666 to +697
func (h *ProxyHandler) compactResponsesRequest(ctx context.Context, requestFields map[string]json.RawMessage, extraHeaders http.Header) (string, *http.Response, error) {
key, ok := compactInflightKey(requestFields, extraHeaders)
if !ok {
budget := newCompactBudget(h.effectiveCompactMaxAttempts())
return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
}

call, leader := h.beginCompactInflight(key)
if !leader {
return waitCompactInflight(ctx, call)
}

budget := newCompactBudget(h.effectiveCompactMaxAttempts())
return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
summary, resp, err := h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
result := compactInflightResult{summary: summary, err: err}
if resp != nil {
respBody, readErr := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if readErr != nil {
if err == nil {
err = readErr
}
result.err = err
resp = nil
} else {
result.resp = cloneHTTPResponseWithBody(resp, respBody)
result.respBody = respBody
resp = cloneHTTPResponseWithBody(resp, respBody)
}
}
h.finishCompactInflight(key, call, result)
return summary, resp, err
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants