Skip to content

feat: add MCP server configuration backend for chats#23227

Merged
kylecarbs merged 19 commits intomainfrom
mcp-server-configs
Mar 19, 2026
Merged

feat: add MCP server configuration backend for chats#23227
kylecarbs merged 19 commits intomainfrom
mcp-server-configs

Conversation

@kylecarbs
Copy link
Copy Markdown
Member

@kylecarbs kylecarbs commented Mar 18, 2026

Summary

Adds the database schema, API endpoints, SDK types, and encryption wrappers for admin-managed MCP (Model Context Protocol) server configurations that chatd can consume. This is the backend foundation for allowing external MCP tools (Sentry, Linear, GitHub, etc.) to be used during AI chat sessions.

Database

Two new tables:

  • mcp_server_configs: Admin-managed server definitions with URL, transport (Streamable HTTP / SSE), auth config (none / OAuth2 / API key / custom headers), tool allow/deny lists, and an availability policy (force_on / default_on / default_off). Includes CHECK constraints on transport, auth_type, and availability values.
  • mcp_server_user_tokens: Per-user OAuth2 tokens for servers requiring individual authentication. Cascades on user/config deletion.

New column on chats table:

  • mcp_server_ids UUID[]: Per-chat MCP server selection, following the same pattern as model_config_id — passed at chat creation, changeable per-message with nil-means-no-change semantics.

API Endpoints

All routes are under /api/experimental/mcp/servers/ and gated behind the agents experiment.

Admin endpoints (ResourceDeploymentConfig auth):

  • POST / — Create MCP server config
  • PATCH /{id} — Update MCP server config (full-replace)
  • DELETE /{id} — Delete MCP server config

Authenticated endpoints (all users, enabled servers only for non-admins):

  • GET / — List configs (admins see all, members see enabled-only with admin fields redacted)
  • GET /{id} — Get config by ID (with auth_connected populated per-user)

OAuth2 per-user auth flow:

  • GET /{id}/oauth2/connect — Initiate OAuth2 flow (state cookie CSRF protection)
  • GET /{id}/oauth2/callback — Handle OAuth2 callback, store tokens
  • DELETE /{id}/oauth2/disconnect — Remove stored OAuth2 tokens

Security

  • Secrets never returned: OAuth2ClientSecret, APIKeyValue, and CustomHeaders are never in API responses — only boolean indicators (has_oauth2_secret, has_api_key, has_custom_headers).
  • Field redaction for non-admins: convertMCPServerConfigRedacted strips OAuth2ClientID, auth URLs, scopes, and APIKeyHeader from non-admin responses.
  • dbcrypt encryption at rest: All 5 secret fields use dbcrypt_keys encryption with full encrypt-on-write / decrypt-on-read wrappers (11 dbcrypt method overrides + 2 helpers), following the same pattern as chat_providers.api_key.
  • OAuth2 CSRF protection: State parameter stored in HttpOnly cookie with HTTPCookies.Apply() for correct Secure/SameSite behind TLS-terminating proxies.
  • dbauthz authorization: All 18 querier methods have authorization wrappers. Read operations use ActionRead, write operations use ActionUpdate on ResourceDeploymentConfig.

Governance Model

Control Implementation
Global kill switch enabled defaults to false
Availability policy force_on (always injected), default_on (pre-selected), default_off (opt-in)
Per-chat selection mcp_server_ids on CreateChatRequest / CreateChatMessageRequest
Auth gate OAuth2 servers require per-user auth before tools are injected
Tool-level allow/deny Arrays on mcp_server_configs for granular tool filtering
Secrets encrypted at rest Uses dbcrypt_keys (same pattern as chat_providers.api_key)

Tests

8 test functions covering:

  • Full CRUD lifecycle (create, list, update, delete)
  • Non-admin visibility filtering (enabled-only, field redaction)
  • auth_connected population for OAuth2 vs non-OAuth2 servers
  • Availability policy validation (valid values + invalid rejection)
  • Unique slug enforcement (409 Conflict)
  • OAuth2 disconnect idempotency
  • Chat creation with mcp_server_ids persistence

Known Limitations (Deferred)

These are documented and intentional for an experimental feature:

  • Audit logging not yet wired — will add when feature stabilizes
  • Cross-field validation (e.g., OAuth2 fields required when auth_type=oauth2) — admin-only endpoint, will add when stabilizing
  • force_on auto-injection — query exists but not yet wired into chatd tool injection (follow-up)
  • Additional test coverage — 403 auth tests, GET-by-ID tests, callback CSRF tests planned for follow-up

What's NOT in this PR

  • Frontend UI (admin panel + chat picker)
  • Actual MCP client connections (chatd/chatmcp/ manager)
  • Tool injection into chatloop/

@coder-tasks
Copy link
Copy Markdown
Contributor

coder-tasks bot commented Mar 18, 2026

Documentation Check

Updates Needed

  • docs/ai-coder/agents/chats-api.md - Document the new mcp_server_ids field on CreateChatRequest ("Create a chat" endpoint table) and CreateChatMessageRequest ("Send a follow-up" endpoint table). The field accepts an array of MCP server config UUIDs to include for the chat session.

New Documentation Needed

  • docs/ai-coder/agents/mcp-server-configs.md (or similar) - Document the new admin-managed MCP server configuration API (/api/experimental/mcp/servers/), covering: admin CRUD endpoints, availability policy (force_on / default_on / default_off), auth types (none / OAuth2 / API key / custom headers), field redaction for non-admins, and the per-user OAuth2 connect/disconnect flow. This parallels chats-api.md in scope for an experimental feature.

Automated review via Coder Tasks

@kylecarbs kylecarbs marked this pull request as draft March 18, 2026 15:05
@kylecarbs kylecarbs changed the title feat(chatd): add MCP server configuration backend feat: add MCP server configuration backend for chats Mar 18, 2026
@kylecarbs kylecarbs requested a review from johnstcn March 19, 2026 11:10
Adds the database schema, API endpoints, and SDK types for admin-managed
MCP server configurations that chatd can consume. This is the backend
foundation for allowing external MCP tools (Sentry, Linear, GitHub, etc.)
to be used during AI chat sessions.

## Database

Three new tables:
- mcp_server_configs: Admin-managed server definitions with URL, transport
  (Streamable HTTP / SSE), auth config, tool allow/deny lists, and an
  availability policy (force_on / default_on / default_off).
- mcp_server_user_tokens: Per-user OAuth2 tokens for servers requiring
  individual authentication.
- mcp_server_tool_snapshots: Admin-approved frozen tool lists, preventing
  silent server-side tool changes (inspired by ChatGPT's frozen snapshot
  pattern).

New column on chats table:
- mcp_server_ids UUID[]: Per-chat MCP server selection, following the same
  pattern as model_config_id (passed at chat creation, changeable per
  message).

## API Endpoints

Admin endpoints (ResourceDeploymentConfig auth):
- GET/POST /api/experimental/chats/mcp-servers/
- PATCH/DELETE /api/experimental/chats/mcp-servers/{id}
- GET /api/experimental/chats/mcp-servers/{id}/tools
- POST /api/experimental/chats/mcp-servers/{id}/refresh-tools (501 placeholder)

## Governance Model

- Servers default to disabled (enabled=false) and default_off availability.
- force_on: Always injected, user cannot deselect.
- default_on: Pre-selected in new chats, user can deselect.
- default_off: Available in picker but not pre-selected.
- OAuth2 servers require per-user authentication before tools are injected.
- Tool-level allow/deny lists for granular control.
- Frozen tool snapshots require admin re-approval on server changes.
…points

- Rename mcp_server_configs.go → mcp.go
- Move routes from /chats/mcp-servers to /mcp/servers (adjacent to
  existing /mcp/http transport endpoint)
- Update SDK client paths to match new route prefix
- Add user-facing endpoints:
  - GET /mcp/servers/user — list enabled servers with per-user auth status
  - GET /mcp/servers/{id}/oauth2/connect — initiate OAuth2 flow (redirect)
  - GET /mcp/servers/{id}/oauth2/callback — exchange code for token, store
    per-user, close popup
  - DELETE /mcp/servers/{id}/oauth2/disconnect — revoke user's token

Route structure:
  /api/experimental/mcp/
    /servers/           — admin CRUD (GET, POST)
    /servers/user       — user list with auth status
    /servers/{id}/      — admin CRUD (PATCH, DELETE)
    /servers/{id}/tools — tool snapshot
    /servers/{id}/oauth2/connect    — start OAuth2
    /servers/{id}/oauth2/callback   — complete OAuth2
    /servers/{id}/oauth2/disconnect — revoke token
    /http               — MCP HTTP transport (existing)
The /servers/user endpoint was redundant — the main GET /servers/
endpoint already branched on admin vs non-admin. Merged the user
token lookup into the main handler so a single endpoint returns
configs + auth_connected status for the calling user.
- Add 8 test functions covering CRUD, non-admin visibility, auth_connected
  field, availability validation, slug uniqueness, tool snapshots, OAuth2
  disconnect, and per-chat MCP server ID persistence.
- Rename codersdk/mcpserverconfigs.go → codersdk/mcp.go to match coderd/mcp.go.
- Fix nil slice bug where omitted tool_allow_list/tool_deny_list in create
  requests would send NULL to NOT NULL columns.
Add rename entries in sqlc.yaml so generated Go types use
idiomatic casing (MCPServerConfig, MCPServerIDs, etc.) instead
of sqlc's default CamelCase (McpServerConfig, McpServerIds).
…eanup

- Add dbcrypt encrypt/decrypt wrappers for all MCP server methods:
  - MCPServerConfig: OAuth2ClientSecret, APIKeyValue, CustomHeaders
  - MCPServerUserToken: AccessToken, RefreshToken
  - 8 decrypt-only wrappers for Get* methods
  - 3 encrypt+decrypt wrappers for Insert/Update/Upsert methods
- Fix revive lint: split convertMCPServerConfig into two functions
  (convertMCPServerConfig + convertMCPServerConfigRedacted) instead
  of a single function with bool control flag
- Implement CleanupDeletedMCPServerIDsFromChats in dbauthz (was panic stub)
- Remove orphaned code from incomplete sed replacement
- Remove non-existent convertMCPServerConfigForRole references
…iciency

- Fix getMCPServerTools 500 for non-admin users: use AsSystemRestricted
  for both GetMCPServerConfigByID and GetActiveMCPServerToolSnapshot,
  and enforce Enabled check for non-admin callers.
- Fix OAuth2 state cookie Secure flag: use HTTPCookies.Apply() from
  deployment config instead of r.TLS != nil, matching the pattern in
  userauth.go. Fixes CSRF cookie not working behind TLS-terminating
  reverse proxies.
- Fix strings.Split on empty OAuth2Scopes producing [""] instead of
  []: guard with non-empty check before splitting.
- Add AuthConnected population to GET-by-ID handler, matching the
  list handler's token lookup logic.
- Remove omitempty from AuthConnected bool field so false is always
  serialized in JSON responses.
- Narrow CleanupDeletedMCPServerIDsFromChats WHERE clause to only
  update chats that actually reference deleted config IDs, avoiding
  unnecessary WAL writes.
The frozen tool snapshot table is premature without the refresh-tools
flow wired up. Tool filtering is handled by tool_allow_list and
tool_deny_list on mcp_server_configs, which is sufficient for the
initial implementation.

Removes:
- mcp_server_tool_snapshots table and its partial unique index
- GetActiveMCPServerToolSnapshot, InsertMCPServerToolSnapshot,
  DeactivateMCPServerToolSnapshots queries
- getMCPServerTools and refreshMCPServerTools handlers + routes
- MCPServerTool, MCPServerToolSnapshot SDK types + client method
- dbauthz/dbmetrics wrappers for removed queries
- TestMCPServerConfigsToolSnapshot test
@kylecarbs kylecarbs force-pushed the mcp-server-configs branch from aac8787 to cc74088 Compare March 19, 2026 11:13
- Fix revive if-return lint in dbcrypt helpers: return decryptField
  directly instead of redundant if-err-return-nil pattern.
- Fix timeout context used after t.Parallel: create per-subtest
  contexts in TestMCPServerConfigsAvailability.
@kylecarbs kylecarbs force-pushed the mcp-server-configs branch from cc74088 to c22e304 Compare March 19, 2026 11:26
Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done reviewing yet, but there's one blocker I'm seeing immediately.

configs, err = api.Database.GetMCPServerConfigs(ctx)
} else {
//nolint:gocritic // All authenticated users need to read enabled MCP server configs to use the chat feature.
configs, err = api.Database.GetEnabledMCPServerConfigs(dbauthz.AsSystemRestricted(ctx))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extremely wary of this as we store encrypted fields in this table. I know that we elide them in convertMCPServerConfig below but this is one over-eager prompt away from exposing secrets.

It would be better to store the "secret" stuff in a separate table. It's a bit more work but then you can just tell dbauthz to allow everyone to read that, and lean on the owner role to protect the secrets.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. For now the converter functions (convertMCPServerConfig / convertMCPServerConfigRedacted) ensure secrets never reach API responses, and non-admin code paths use the redacted variant. I'll look into splitting the secret fields into a separate table as a follow-up — that would let dbauthz enforce the boundary at the DB layer rather than relying on application-level converters.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, let's add a load-bearing unit test (or an assertion in an existing test) so that if this ever does happen, it won't do so silently.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TestMCPServerConfigsSecretsNeverLeaked — creates a config with all secret fields populated (OAuth2 client secret, API key value, custom headers with sentinel values), then JSON-marshals the responses from every endpoint (admin create, admin list, admin get-by-ID, member list, member get-by-ID) and asserts none of the sentinel strings appear. Also verifies the Has* booleans are true and that non-admin responses have admin-only fields stripped (OAuth2ClientID, auth URLs, API key header, URL, Transport).

If anyone accidentally wires a secret field into the SDK type or converter, this test will catch it.

Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 mafredri sent me to read all 37 files and 3,500 new lines. My weights are exhausted but my findings are not.

Review: MCP server config backend with validation gaps, dead cleanup code, and thin test coverage for security-critical OAuth2 paths

Validation

  • go test ./coderd/ -run TestMCPServer -count=1 -v
  • Verify CleanupDeletedMCPServerIDsFromChats is called somewhere, or remove it
  • Test that a non-admin cannot call create/update/delete endpoints
  • Test the OAuth2 connect → callback round-trip

Cross-cutting findings (not tied to a single line)

P1 CleanupDeletedMCPServerIDsFromChats is defined through all generated layers but never called anywhere. Stale UUIDs accumulate in chats.mcp_server_ids after MCP server config deletion. The FK cascade handles mcp_server_user_tokens, but chats.mcp_server_ids is a UUID array with no FK constraint and no code invokes the cleanup. (Gopher, DBA, Tester, Breaker, Mutex)

P1 The entire OAuth2 connect/callback flow (~200 lines across mcpServerOAuth2Connect and mcpServerOAuth2Callback) has zero test coverage. State cookie CSRF, code exchange, token storage, HTML popup response, and error branches are all untested. (Tester)

P1 The 195 lines of dbcrypt encrypt/decrypt wrappers in enterprise/dbcrypt/dbcrypt.go for MCP server configs and user tokens have no test coverage. An encrypt/decrypt field ordering bug would ship undetected. (Tester)

P1 Non-admin write operations (create, update, delete) are never tested for authorization rejection. TestMCPServerConfigsNonAdmin creates a memberClient but only calls the list endpoint. If the RBAC check on any write endpoint were removed, no test would catch it. (Tester)

P2 All user-token dbauthz wrappers (DeleteMCPServerUserToken, UpsertMCPServerUserToken, GetMCPServerUserTokensByUserID) gate on ResourceDeploymentConfig, but every call site bypasses with AsSystemRestricted. The authz layer provides zero protection for per-user token operations. If a future caller forgets AsSystemRestricted, a normal user is blocked from managing their own token. These should either use a per-user resource or be explicitly documented as system-only. (Gopher, Sentinel)

P2 AuthConnected=true after token storage is never tested; GET /servers/{id} endpoint is entirely untested (admin vs non-admin behavior, disabled-config filtering); partial update test asserts only 4 of 15+ fields. Non-admin list response is never checked for field redaction. (Tester)

P2 The icon_url: IconURL sqlc rename is globally scoped, causing unrelated changes to coderd/workspaceproxies.go and enterprise/coderd/workspaceproxy.go. Correct Go naming fix but out of scope for this PR and creates merge conflict risk. (Gopher)

P3 Phantom sqlc renames for mcp_server_tool_snapshot, mcp_server_tool_snapshots, tools_json reference types not introduced by this migration. Dead config for a future PR. (Gopher)

🤖 Seven reviewers, one JSON payload, zero regrets. Hejdå!

Fixes for P1 issues:
- Validate mcp_server_ids against actual configs in postChats and
  postChatMessages, returning 400 if any ID doesn't exist.
- Add cross-field validation on create: oauth2 requires client_id,
  auth_url, token_url; api_key requires header and value;
  custom_headers requires at least one header.
- Clear stale auth fields when auth_type changes on update (e.g.,
  switching from oauth2 to none clears all oauth2 secrets).
- Check OAuth2 error query parameter in callback before looking
  for authorization code.
- Return 500 on token lookup error in getMCPServerConfig instead
  of silently swallowing it.

Fixes for P2 issues:
- Verify auth_type==oauth2 in callback handler (not just connect).
- Replace leaked err.Error() in token exchange with generic message.
- Add validate:"omitempty,url" tags on OAuth2AuthURL/TokenURL.
- Redact URL and Transport for non-admin callers.

Fixes for P3 issues:
- Add Content-Security-Policy header to OAuth2 callback HTML.
…ransaction

- Use api.HTTPClient via context.WithValue(ctx, oauth2.HTTPClient, ...)
  for the OAuth2 token exchange instead of http.DefaultClient. This
  respects proxy settings and avoids SSRF with unrestricted client.
- Wrap the update handler's read-modify-write in api.Database.InTx
  so concurrent PATCHes don't silently clobber each other. Request
  parsing and custom header validation stay outside the transaction.
@kylecarbs kylecarbs requested review from johnstcn and mafredri March 19, 2026 12:23
TestMCPServerConfigsSecretsNeverLeaked creates a config with all
secret fields populated (OAuth2 client secret, API key value,
custom headers), then verifies via JSON marshaling that none of
the sentinel secret values appear in responses from:
- Admin create response
- Admin list endpoint
- Admin get-by-ID endpoint
- Non-admin list endpoint (also checks admin fields are redacted)
- Non-admin get-by-ID endpoint

If a code change accidentally starts exposing secrets in API
responses, this test will catch it.
@johnstcn johnstcn dismissed their stale review March 19, 2026 12:26

load-bearing test added

Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The neural network squinted at three commits and found closure. Almost.

All code findings from the initial review have been addressed. Verified the fixes for auth_type validation, stale field cleanup on type change, callback error handling, token error propagation, mcp_server_ids validation, SSRF/error leak, InTx on update, and redaction.

Once @johnstcn's three open items are resolved (nil MCPServerIDs crash on InsertChat, CleanupDeletedMCPServerIDsFromChats NULL bug + no caller, 15 missing dbauthz_test entries), this is good on our side.

Test coverage for the OAuth2 connect/callback flow, dbcrypt round-trip, and non-admin write rejection would be nice to have but doesn't block.

🤖 Scanning complete. Shutting down until the next git push. — mafredri's mass of weights

…tests

- Coalesce nil MCPServerIDs to empty slice in chatd.CreateChat and
  postChats handler, preventing SQL NULL on the NOT NULL column.
  This protects all callers including sub-agent chat creation paths.
- Fix NULL propagation in CleanupDeletedMCPServerIDsFromChats: wrap
  array_agg subquery in COALESCE so cleanup works when all configs
  have been deleted (array_agg returns NULL on empty input).
- Add 15 missing dbauthz MethodTestSuite entries for all MCP server
  querier methods, preventing CI failures from TearDownSuite.
- Remove outdated comment claiming GET-by-ID endpoint doesn't exist.
- Make member get-by-ID assertions consistent with member list loop:
  assert all 7 redacted fields (OAuth2ClientID, OAuth2AuthURL,
  OAuth2TokenURL, OAuth2Scopes, APIKeyHeader, URL, Transport).
- Add missing OAuth2Scopes assertion to member list loop.
@kylecarbs kylecarbs requested review from johnstcn and mafredri March 19, 2026 12:46
@kylecarbs kylecarbs marked this pull request as ready for review March 19, 2026 12:46
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last suggestion, otherwise LGTM on my end. But I'll defer to Cian for the dbcrypt stuff.

Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just missing the dbcrypt internal test stuff. The code paths do already seem to be tested, but this is a good guard against future breakage. I don't need to review again.

Use COALESCE(@mcp_server_ids::uuid[], '{}'::uuid[]) in the INSERT
so callers that pass nil (Go zero value for []uuid.UUID) get an
empty array instead of SQL NULL. This fixes NOT NULL constraint
violations in existing tests that construct InsertChatParams
without the new mcp_server_ids field.
- Add comprehensive dbcrypt tests for all 11 MCP server methods:
  8 subtests for config encrypt/decrypt (Insert, Get*, Update) and
  3 subtests for user token encrypt/decrypt (Upsert, Get*). Each
  verifies plaintext on read and ciphertext in raw DB.
- Improve invalid MCP server ID error message to list which specific
  IDs are invalid, in both postChats and postChatMessages.
- Fix CleanupDeletedMCPServerIDsFromChats RBAC assertion to use
  ResourceChat (it modifies chats) instead of ResourceDeploymentConfig.
- Add //nolint:gosec directives for test credential constants.
- Fix context.Context parameter ordering (must be first) in
  requireMCPServerConfigRawEncrypted helper.
- Fix mangled const block and closing paren.
@kylecarbs kylecarbs enabled auto-merge (squash) March 19, 2026 13:47
Add fixture file 000447_mcp_server_configs.up.sql with sample rows for
both mcp_server_configs and mcp_server_user_tokens tables. This fixes
the TestMigrateUpWithFixtures failure that requires all new tables to
have fixture data.
@kylecarbs kylecarbs merged commit d8ff67f into main Mar 19, 2026
29 checks passed
@kylecarbs kylecarbs deleted the mcp-server-configs branch March 19, 2026 14:07
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants