feat(agent): add process execution API and rewrite execute tool#22416
Merged
feat(agent): add process execution API and rewrite execute tool#22416
Conversation
This adds a new agent-side process management HTTP API and rewrites the
chat execute tool to use it instead of SSH sessions.
## Agent-side changes
New `agent/agentproc/` package providing:
- `HeadTailBuffer`: Thread-safe io.Writer with bounded memory (16KB
head + 16KB tail ring buffer). Provides LLM-ready output with
truncation metadata and long-line truncation at 2048 bytes.
- `Manager`: Process lifecycle management using `agentexec.Execer` for
proper OOM/nice scores. Tracks processes in a map, captures
stdout+stderr into HeadTailBuffer, supports background processes.
- HTTP API mounted at `/api/v0/processes` following the `agentfiles`
pattern:
- POST /start - Start a foreground or background process
- GET /list - List all tracked processes
- GET /{id}/output - Get truncated output with status
- POST /{id}/signal - Send kill/terminate signal
## SDK changes
Four new methods on the `AgentConn` interface with corresponding
request/response types: `StartProcess`, `ListProcesses`,
`ProcessOutput`, `SignalProcess`.
## Execute tool rewrite
- Switches from SSH sessions to the agent HTTP API
- Adds `workdir` parameter for setting working directory
- Adds `run_in_background` parameter for background processes
- Structured JSON response with success, exit_code, wall_duration_ms,
error, truncated, note, and background_process_id fields
- Sets non-interactive env vars (GIT_EDITOR=true, TERM=dumb, etc.)
- File-dump detection with advisory notes suggesting read_file
- Default timeout lowered from 60s to 10s
- Output capped at 32KB for LLM consumption
- Foreground processes polled every 200ms until exit or timeout
State lives on the agent, surviving coderd failover and instance
changes. Any coderd replica can query any agent's processes via the
HTTP API over tailnet.
Adds three new chat tools that expose background process management to the LLM, completing the lifecycle that starts with execute's run_in_background parameter: - process_output: retrieve output from a background process by ID - process_list: list all tracked processes (running and exited) - process_signal: send terminate (SIGTERM) or kill (SIGKILL) to a process These map directly to the agent HTTP API endpoints already wired up in agent/agentproc and codersdk/workspacesdk.AgentConn.
- gofmt struct alignment in agent/agent.go - handle strings.Builder write return values in headtail.go (revive) - move defer out of loop in pollProcess (revive)
10e7f75 to
0cb3813
Compare
Critical fixes: - All processes now use cancellable context.Background() so they survive the HTTP request lifecycle. Previously foreground processes used r.Context() and were killed when the response was written. - Add Close() method to manager that cancels all process contexts and waits for them to exit. Wired into agent shutdown. - Set cmd.WaitDelay=5s so cmd.Wait() returns promptly even when child processes hold pipes open. Code quality: - Unexport all symbols only used within the package: Process, Manager, NewManager, all Handle* methods, all request/response types in api.go. - Eliminate type duplication: import SDK types from codersdk/workspacesdk instead of defining duplicates. - Remove dead MaxBufferSize constant. - Use sentinel errors (errProcessNotFound, errProcessNotRunning) instead of TOCTOU pattern in signal handler. - Return 409 Conflict for signaling an exited process (was 500). - UTF-8 safe output truncation via strings.ToValidUTF8. - Consolidate ProcessOutputOptions/ProcessListOptions/ ProcessSignalOptions into single ProcessToolOptions type. - Use quartz.Clock instead of time.Now() for testability. Tests (new file agent/agentproc/api_test.go): - TestStartProcess: foreground, background, empty command, malformed JSON, custom workdir, custom env - TestListProcesses: empty, mixed running/exited - TestProcessOutput: exited, running, nonexistent - TestSignalProcess: kill, terminate, nonexistent, already exited, empty signal, invalid signal - TestProcessLifecycle: output + exit code, non-zero exit, start-signal-verify, output exceeding buffer, stderr captured
- Skip TestSignalProcess/TerminateRunning on Windows (SIGTERM not supported). - Fix TestStartProcess/CustomWorkDir to use a marker file instead of comparing pwd output, which differs between POSIX and Windows shells.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new agent-side process management HTTP API and rewrites the chat execute tool to use it instead of SSH sessions.
What changed
New agent/agentproc/ package
Agent wiring (agent/agent.go, agent/api.go)
Mounts the process API at /api/v0/processes, mirroring how agentfiles is mounted.
SDK (codersdk/workspacesdk/agentconn.go)
4 new AgentConn interface methods + 7 request/response types:
Execute tool rewrite (coderd/chatd/chattool/execute.go)
Architecture
State lives on the agent, surviving coderd failover and instance changes. Any coderd replica can query any agent via HTTP over tailnet.