Skip to content

feat: chat desktop backend#23005

Merged
hugodutka merged 1 commit intomainfrom
hugodutka/agents-desktop
Mar 13, 2026
Merged

feat: chat desktop backend#23005
hugodutka merged 1 commit intomainfrom
hugodutka/agents-desktop

Conversation

@hugodutka
Copy link
Copy Markdown
Contributor

@hugodutka hugodutka commented Mar 12, 2026

Implement the backend for the desktop feature for agents.

  • Adds a new /api/experimental/chats/$id/desktop endpoint to coderd which exposes a VNC stream from a portabledesktop process running inside the workspace
  • Adds a new spawn_computer_use_agent tool to chatd, which spawns a subagent that has access to the computer tool which lets it interact with the portabledesktop process running inside the workspace
  • Adds the plumbing to make the above possible

There's a follow up frontend PR here: #23006

@hugodutka hugodutka marked this pull request as draft March 12, 2026 17:37
@hugodutka hugodutka force-pushed the hugodutka/agents-desktop branch from 08a8db5 to 5660da1 Compare March 12, 2026 17:37
@coder-tasks
Copy link
Copy Markdown
Contributor

coder-tasks bot commented Mar 12, 2026

Documentation Check

New Documentation Needed

  • docs/ai-coder/computer-use.md (or additions to docs/ai-coder/tasks.md) — Document the new computer use capability: AI agents can now spawn a dedicated spawn_computer_use_agent subagent that takes screenshots, moves the mouse, and types text in the workspace desktop. Include requirements (e.g. VNC/desktop environment in the workspace), how it integrates with Tasks, and the model used (claude-opus-4-6).
  • docs/manifest.json — Navigation entry added for the new Chats API reference page.

Context

This PR adds:

  • A new agentdesktop package providing VNC streaming, screenshots, and computer actions (key, type, click, scroll) via the agent API
  • A new spawn_computer_use_agent tool for chat subagents that delegates to Anthropic's computer use model
  • A new GET /chats/{chat}/desktop WebSocket endpoint for VNC desktop streaming

The feature is marked EXPERIMENTAL in code. Documentation may be intentionally deferred, but the spawn_computer_use_agent tool is user-visible behavior worth capturing under docs/ai-coder/.


Automated review via Coder Tasks

@hugodutka hugodutka force-pushed the hugodutka/agents-desktop branch 5 times, most recently from cb5a27d to eab5e1d Compare March 13, 2026 15:31
slog.Error(err),
)
if attempt < downloadRetries-1 {
time.Sleep(downloadRetryDelay)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ensureBinary have a context? It seems weird that it doesn't. I kinda expected this to be piped through everything for a graceful shutdown.

Comment on lines +32 to +46
// platformBinaries maps GOARCH to download URL and expected SHA-256
// digest for each supported platform.
var platformBinaries = map[string]struct {
URL string
SHA256 string
}{
"amd64": {
URL: "https://github.com/coder/portabledesktop/releases/download/" + portableDesktopVersion + "/portabledesktop-linux-x64",
SHA256: "a04e05e6c7d6f2e6b3acbf1729a7b21271276300b4fee321f4ffee6136538317",
},
"arm64": {
URL: "https://github.com/coder/portabledesktop/releases/download/" + portableDesktopVersion + "/portabledesktop-linux-arm64",
SHA256: "b8cb9142dc32d46a608f25229cbe8168ff2a3aadc54253c74ff54cd347e16ca6",
},
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat skeptical that we should be doing this downloading at all... since customers will likely have to download a browser too, it's maybe best for this change if we just require it on $PATH and customers will need to adjust their workspace images?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather pin a specific version of portabledesktop and auto-download it for now in case we need to make breaking changes to the portabledesktop cli.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We DM'd about it: I'm going to leave the current approach as is in this PR. I'll do a follow up PR next week to only get portabledesktop from PATH and publish a template module to set it up.

@@ -0,0 +1,3 @@
CREATE TYPE chat_type AS ENUM ('computer_use');

ALTER TABLE chats ADD COLUMN chat_type chat_type;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this mode instead? Just because chat.chat_type is janky.

},
),
fantasy.NewAgentTool(
"spawn_computer_use_agent",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that this is a separate tool 👍

Comment on lines +135 to +152
// Check that the Anthropic provider is configured.
// Computer use requires an Anthropic model; verify
// the key is available before creating the child chat.
anthropicAvailable := p.providerAPIKeys.APIKey("anthropic") != ""
if !anthropicAvailable {
dbProviders, err := p.db.GetEnabledChatProviders(ctx)
if err == nil {
for _, prov := range dbProviders {
if chatprovider.NormalizeProvider(prov.Provider) == "anthropic" && strings.TrimSpace(prov.APIKey) != "" {
anthropicAvailable = true
break
}
}
}
}
if !anthropicAvailable {
return fantasy.NewTextErrorResponse("Computer use requires an Anthropic API key."), nil
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better if we just omitted this tool unless Anthropic is configured?

I think the ideal would be that we only inject this if Anthropic is configured and portabledesktop is in the path of the corresponding workspace, but we don't have to do that immediately.

Comment on lines +869 to +875
// ToolDefiner is an optional interface that AgentTools can
// implement to control how they appear in Call.Tools. When
// present, buildToolDefinitions uses the returned Tool instead
// of constructing a FunctionTool from Info().
type ToolDefiner interface {
ToolDefinition() fantasy.Tool
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Isn't it just a normal tool definition that we return anyways?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a provider tool like web_search. ToolDefiner was old code that remained from before you added support for provider tools in chatd. I got rid of it now.

@hugodutka hugodutka force-pushed the hugodutka/agents-desktop branch 9 times, most recently from 564f60c to ea9f7e2 Compare March 13, 2026 17:58
@hugodutka hugodutka marked this pull request as ready for review March 13, 2026 18:10
@hugodutka hugodutka requested a review from mafredri March 13, 2026 18:10
@hugodutka hugodutka force-pushed the hugodutka/agents-desktop branch from ea9f7e2 to 0b00ce3 Compare March 13, 2026 18:21
@hugodutka hugodutka merged commit 8452739 into main Mar 13, 2026
24 checks passed
@hugodutka hugodutka deleted the hugodutka/agents-desktop branch March 13, 2026 18:49
@github-actions github-actions bot locked and limited conversation to collaborators Mar 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants