fix: improve responses compaction chunking by sozercan · Pull Request #119 · sozercan/vekil

sozercan · 2026-05-16T16:33:14Z

Summary

dedupe identical in-flight compact requests
add configurable sibling chunk concurrency for compact fallback
make compact attempt budget safe for parallel chunk fanout
proactively chunk when a smaller compact chunk size is explicitly configured

Tests

go test ./... -count=1

Copilot

Pull request overview

This PR improves the proxy’s /v1/responses/compact fallback behavior by making chunked compaction more efficient under load (deduping identical in-flight requests), enabling parallel sibling-chunk compaction with a configurable concurrency limit, and making the shared compaction-attempt budget safe to use across parallel chunk fanout.

Changes:

Added in-flight deduplication for identical /v1/responses/compact requests to avoid redundant upstream compaction work.
Introduced configurable sibling-chunk concurrency for 413 fallback chunk fanout, plus budget locking/snapshots to keep attempt accounting thread-safe.
Enabled proactive chunking when a smaller compact chunk size is explicitly configured, and wired the new concurrency setting through server options and CLI flags.

Show a summary per file

File	Description
`server/server.go`	Exposes a new server option to configure compact chunk concurrency via proxy options.
`proxy/responses_handler.go`	Implements in-flight request dedupe, parallel chunk fanout with cancellation, and makes compaction budget safe for concurrent use.
`proxy/handler.go`	Adds handler configuration fields and effective defaulting for proactive chunking and chunk concurrency.
`main.go`	Adds CLI/env configuration for compact chunk concurrency and plumbs it into server startup.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 4/4 changed files
Comments generated: 4

+	result := compactInflightResult{summary: summary, err: err}
+	if resp != nil {
+		respBody, readErr := io.ReadAll(resp.Body)
+		_ = resp.Body.Close()
+		if readErr != nil {
+			if err == nil {


+	raw, err := json.Marshal(payload)
+	if err != nil {
+		return "", false
+	}
+	sum := sha256.Sum256(raw)
+	return hex.EncodeToString(sum[:]), true


 func (h *ProxyHandler) compactResponsesRequest(ctx context.Context, requestFields map[string]json.RawMessage, extraHeaders http.Header) (string, *http.Response, error) {
+	key, ok := compactInflightKey(requestFields, extraHeaders)
+	if !ok {
+		budget := newCompactBudget(h.effectiveCompactMaxAttempts())
+		return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
+	}
+
+	call, leader := h.beginCompactInflight(key)
+	if !leader {
+		return waitCompactInflight(ctx, call)
+	}


+	if len(chunks) > 1 {
+		concurrency := h.effectiveCompactChunkConcurrency()
+		remaining := len(chunks) - 1
+		if concurrency > remaining {
+			concurrency = remaining
+		}
+		if concurrency < 1 {
+			concurrency = 1
+		}
+
+		fanoutCtx, cancelFanout := context.WithCancel(ctx)
+		defer cancelFanout()
+
+		jobs := make(chan int)
+		errCh := make(chan error, 1)
+		var wg sync.WaitGroup
+		for worker := 0; worker < concurrency; worker++ {
+			wg.Add(1)
+			go func() {
+				defer wg.Done()


chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee7aa7e09e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T16:40:31Z

+					if fanoutCtx.Err() != nil {
+						return
+					}
+					summary, chunkErr := processChunk(fanoutCtx, i)


Re-check learned target before each fanout chunk

When the first chunk fits but a later sibling is the first one to receive a 413, that worker records a smaller learned target, but the rest of the fanout continues to call processChunk with the original precomputed chunk layout. For mixed-size histories this makes every remaining over-cap sibling pay its own discovery 413 (and recursive sub-fanout) instead of re-splitting after the first later failure as the old sequential loop did, which can exhaust compact-upstream-max-attempts and return the original 413 for large but otherwise compactable requests. Consider checking budget.learnedTargetValue() before dispatching/processing each sibling or stopping the fanout to re-split the unsent tail once any worker learns a smaller target.

Useful? React with 👍 / 👎.

Copilot

Copilot's findings

Files reviewed: 8/8 changed files
Comments generated: 3

+	if resp != nil {
+		respBody, readErr := io.ReadAll(resp.Body)
+		_ = resp.Body.Close()
+		if readErr != nil {


+| `--responses-ws-auto-compact-max-items` | `RESPONSES_WS_AUTO_COMPACT_MAX_ITEMS` | `8` | Auto-compact when history item count exceeds this threshold |
+| `--responses-ws-auto-compact-max-bytes` | `RESPONSES_WS_AUTO_COMPACT_MAX_BYTES` | `32768` | Auto-compact when raw history byte size exceeds this threshold |
+| `--responses-ws-auto-compact-keep-tail` | `RESPONSES_WS_AUTO_COMPACT_KEEP_TAIL` | `4` | Keep this many most recent items verbatim after websocket auto-compaction and as the starting tail for replay-like `POST /v1/responses` 413 compaction; the replay fallback aligns tool/call boundaries and may halve this per request if compacted retries still return 413 |
 | `--compact-upstream-chunk-bytes` | `COMPACT_UPSTREAM_CHUNK_BYTES` | `4194304` | Initial target body size for chunked compaction retries after an upstream `413` (`/v1/responses/compact` and `POST /v1/responses` replay compaction). Default sized below the empirically observed Copilot `/responses` payload cap of `5 MiB` (`5,242,880` bytes); `4 MiB` leaves headroom for proxy-owned overhead. Halved on each recursive `413` (and eagerly halved when a sub-target body still 413s) down to a `64 KiB` floor. Learned smaller targets are cached in memory for 30 minutes per provider/model/endpoint so later compaction fallbacks skip known-doomed larger chunks; tune up for providers like Azure that accept larger bodies, or down if your upstream's payload cap is below `4 MiB` |


 func (h *ProxyHandler) compactResponsesRequest(ctx context.Context, requestFields map[string]json.RawMessage, extraHeaders http.Header) (string, *http.Response, error) {
+	key, ok := compactInflightKey(requestFields, extraHeaders)
+	if !ok {
+		budget := newCompactBudget(h.effectiveCompactMaxAttempts())
+		return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
+	}
+
+	call, leader := h.beginCompactInflight(key)
+	if !leader {
+		return waitCompactInflight(ctx, call)
+	}
+
 	budget := newCompactBudget(h.effectiveCompactMaxAttempts())
-	return h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
+	summary, resp, err := h.compactResponsesRequestWithBudget(ctx, requestFields, extraHeaders, budget)
+	result := compactInflightResult{summary: summary, err: err}
+	if resp != nil {
+		respBody, readErr := io.ReadAll(resp.Body)
+		_ = resp.Body.Close()
+		if readErr != nil {
+			if err == nil {
+				err = readErr
+			}
+			result.err = err
+			resp = nil
+		} else {
+			result.resp = cloneHTTPResponseWithBody(resp, respBody)
+			result.respBody = respBody
+			resp = cloneHTTPResponseWithBody(resp, respBody)
+		}
+	}
+	h.finishCompactInflight(key, call, result)
+	return summary, resp, err


fix: improve responses compaction chunking

ee7aa7e

Copilot AI review requested due to automatic review settings May 16, 2026 16:33

Copilot started reviewing on behalf of sozercan May 16, 2026 16:33 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

sozercan and others added 2 commits May 16, 2026 20:59

Merge branch 'main' into fix-from-logs

67ff4df

Fix responses compaction under pressure

dfd9346

Copilot AI review requested due to automatic review settings May 17, 2026 04:23

Copilot started reviewing on behalf of sozercan May 17, 2026 04:23 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve responses compaction chunking#119

fix: improve responses compaction chunking#119
sozercan wants to merge 3 commits into
mainfrom
fix-from-logs

sozercan commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sozercan commented May 16, 2026

Summary

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants