Skip to content

feat: make process output blocking-capable#23312

Merged
mafredri merged 1 commit intomainfrom
mafredri/blocking-process-output
Mar 20, 2026
Merged

feat: make process output blocking-capable#23312
mafredri merged 1 commit intomainfrom
mafredri/blocking-process-output

Conversation

@mafredri
Copy link
Copy Markdown
Member

@mafredri mafredri commented Mar 19, 2026

Why

Chat agents waste tokens polling process_output in a loop to check if a background process has finished. The chatd tool layer also polls internally via pollProcess() every 200ms over HTTP for foreground commands. The agent already has a done channel per process but no HTTP surface to block on it.

What changed

The agent's HeadTailBuffer gains a sync.Cond and a closed flag. A new waitForOutput(ctx) method on process blocks until the buffer is closed (process exit) or the context is canceled. The GET /{id}/output endpoint supports ?wait=true to enable blocking mode, capped at 5 minutes server-side.

The chatd tool layer replaces its 200ms polling loop (pollProcess) with a single blocking call (waitForProcess). The process_output tool now blocks by default for 10s waiting for the process to exit, with wait_timeout as an override (0s for an immediate snapshot). On timeout, it fetches a non-blocking snapshot so the LLM always gets output.

Concurrency follows the reconnectingpty pattern: all state in the cond loop predicate lives under a single lock (buf.mu). The context-cancel goroutine acquires the lock before broadcasting to prevent lost wakeups. Only Close() broadcasts, not Write(), so waiters only wake on process exit.

Related: #23316 (chat ID isolation fix, independent)

🤖 This PR was created with the help of Coder Agents, and has been reviewed by a human. 🏂🏻

@mafredri mafredri force-pushed the mafredri/blocking-process-output branch from 4636d3e to 3e5e909 Compare March 19, 2026 18:21
@mafredri mafredri changed the title feat(agent/agentproc): make process output blocking-capable feat: make process output blocking-capable Mar 19, 2026
@mafredri mafredri force-pushed the mafredri/blocking-process-output branch from 3e5e909 to 0794a80 Compare March 19, 2026 18:46
@mafredri mafredri closed this Mar 19, 2026
@mafredri mafredri force-pushed the mafredri/blocking-process-output branch from 0794a80 to bb7c5f9 Compare March 19, 2026 19:08
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2026
@mafredri mafredri reopened this Mar 19, 2026
@mafredri mafredri force-pushed the mafredri/blocking-process-output branch 8 times, most recently from c3373df to c9dae1b Compare March 20, 2026 11:59
@mafredri mafredri marked this pull request as ready for review March 20, 2026 12:01
@mafredri mafredri force-pushed the mafredri/blocking-process-output branch 2 times, most recently from 144ada3 to acbeba8 Compare March 20, 2026 12:03
@mafredri mafredri requested a review from johnstcn March 20, 2026 12:06
@mafredri
Copy link
Copy Markdown
Member Author

@codex review

@coder coder unlocked this conversation Mar 20, 2026
// WriteTimeout does not kill the connection while
// we block.
rc := http.NewResponseController(rw)
_ = rc.SetWriteDeadline(time.Now().Add(maxWaitDuration))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail silently and may confuse the client waiting for the command. Suggest either surfacing the error or at least logging it at error.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 mafredri asked me to clean up after myself.

Fixed. Now logs at error level instead of discarding. Good catch, the 20s WriteTimeout fallback would have been invisible.

🤖 Addressed in 7a1baed.

b.tailPos = 0
b.tailFull = false
b.headFull = false
b.closed = false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do anything to the cond?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 mafredri pointed me at this and I saw the gap.

Good question. Added b.cond.Broadcast() at the end of Reset(). If a waiter is blocked on !closed and someone resets the buffer (setting closed = false), the waiter needs to re-evaluate. Reset() is only used in tests today, but correctness should not depend on that.

🤖 Fixed in 7a1baed.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: acbeba8895

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +290 to +297
if resp.Running {
output := truncateOutput(resp.Output)
return ExecuteResult{
Success: false,
Output: output,
ExitCode: -1,
Error: "process is still running after server-side wait expired",
Truncated: resp.Truncated,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Continue waiting after the agent's 5m block cap

When execute.timeout or process_output.wait_timeout is longer than 5 minutes, the blocking ProcessOutput(..., Wait:true) call can return with resp.Running=true because the agent handler hard-caps waits at maxWaitDuration in agent/agentproc/api.go. Treating that snapshot as a terminal failure here means a foreground command that is still within the caller's own timeout now aborts around minute 5 and gets turned into a background process instead of continuing to wait. The previous polling loop honored the full client timeout, so long-running builds/tests regress with this change.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The robot read the robot and agreed.

Valid find. waitForProcess now retries when the server returns Running: true but the client context still has time left, instead of treating the 5min server cap as terminal. Only reports timeout when ctx.Err() != nil.

🤖 Fixed in 7a1baed.

@mafredri mafredri force-pushed the mafredri/blocking-process-output branch from acbeba8 to e492018 Compare March 20, 2026 12:16
@mafredri mafredri force-pushed the mafredri/blocking-process-output branch from e492018 to 7a1baed Compare March 20, 2026 12:19
@mafredri mafredri merged commit 41e15ae into main Mar 20, 2026
26 checks passed
@mafredri mafredri deleted the mafredri/blocking-process-output branch March 20, 2026 12:33
@github-actions github-actions bot locked and limited conversation to collaborators Mar 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants