feat: make process output blocking-capable#23312
Conversation
4636d3e to
3e5e909
Compare
3e5e909 to
0794a80
Compare
0794a80 to
bb7c5f9
Compare
c3373df to
c9dae1b
Compare
144ada3 to
acbeba8
Compare
|
@codex review |
agent/agentproc/api.go
Outdated
| // WriteTimeout does not kill the connection while | ||
| // we block. | ||
| rc := http.NewResponseController(rw) | ||
| _ = rc.SetWriteDeadline(time.Now().Add(maxWaitDuration)) |
There was a problem hiding this comment.
This will fail silently and may confuse the client waiting for the command. Suggest either surfacing the error or at least logging it at error.
There was a problem hiding this comment.
🤖 mafredri asked me to clean up after myself.
Fixed. Now logs at error level instead of discarding. Good catch, the 20s WriteTimeout fallback would have been invisible.
🤖 Addressed in 7a1baed.
| b.tailPos = 0 | ||
| b.tailFull = false | ||
| b.headFull = false | ||
| b.closed = false |
There was a problem hiding this comment.
Do we need to do anything to the cond?
There was a problem hiding this comment.
🤖 mafredri pointed me at this and I saw the gap.
Good question. Added b.cond.Broadcast() at the end of Reset(). If a waiter is blocked on !closed and someone resets the buffer (setting closed = false), the waiter needs to re-evaluate. Reset() is only used in tests today, but correctness should not depend on that.
🤖 Fixed in 7a1baed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: acbeba8895
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if resp.Running { | ||
| output := truncateOutput(resp.Output) | ||
| return ExecuteResult{ | ||
| Success: false, | ||
| Output: output, | ||
| ExitCode: -1, | ||
| Error: "process is still running after server-side wait expired", | ||
| Truncated: resp.Truncated, |
There was a problem hiding this comment.
Continue waiting after the agent's 5m block cap
When execute.timeout or process_output.wait_timeout is longer than 5 minutes, the blocking ProcessOutput(..., Wait:true) call can return with resp.Running=true because the agent handler hard-caps waits at maxWaitDuration in agent/agentproc/api.go. Treating that snapshot as a terminal failure here means a foreground command that is still within the caller's own timeout now aborts around minute 5 and gets turned into a background process instead of continuing to wait. The previous polling loop honored the full client timeout, so long-running builds/tests regress with this change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
🤖 The robot read the robot and agreed.
Valid find. waitForProcess now retries when the server returns Running: true but the client context still has time left, instead of treating the 5min server cap as terminal. Only reports timeout when ctx.Err() != nil.
🤖 Fixed in 7a1baed.
acbeba8 to
e492018
Compare
e492018 to
7a1baed
Compare
Why
Chat agents waste tokens polling
process_outputin a loop to check if a background process has finished. The chatd tool layer also polls internally viapollProcess()every 200ms over HTTP for foreground commands. The agent already has adonechannel per process but no HTTP surface to block on it.What changed
The agent's
HeadTailBuffergains async.Condand aclosedflag. A newwaitForOutput(ctx)method on process blocks until the buffer is closed (process exit) or the context is canceled. TheGET /{id}/outputendpoint supports?wait=trueto enable blocking mode, capped at 5 minutes server-side.The chatd tool layer replaces its 200ms polling loop (
pollProcess) with a single blocking call (waitForProcess). Theprocess_outputtool now blocks by default for 10s waiting for the process to exit, withwait_timeoutas an override (0sfor an immediate snapshot). On timeout, it fetches a non-blocking snapshot so the LLM always gets output.Concurrency follows the
reconnectingptypattern: all state in the cond loop predicate lives under a single lock (buf.mu). The context-cancel goroutine acquires the lock before broadcasting to prevent lost wakeups. OnlyClose()broadcasts, notWrite(), so waiters only wake on process exit.Related: #23316 (chat ID isolation fix, independent)