feat: add MCP server configuration backend for chats#23227
Conversation
Documentation CheckUpdates Needed
New Documentation Needed
Automated review via Coder Tasks |
Adds the database schema, API endpoints, and SDK types for admin-managed
MCP server configurations that chatd can consume. This is the backend
foundation for allowing external MCP tools (Sentry, Linear, GitHub, etc.)
to be used during AI chat sessions.
## Database
Three new tables:
- mcp_server_configs: Admin-managed server definitions with URL, transport
(Streamable HTTP / SSE), auth config, tool allow/deny lists, and an
availability policy (force_on / default_on / default_off).
- mcp_server_user_tokens: Per-user OAuth2 tokens for servers requiring
individual authentication.
- mcp_server_tool_snapshots: Admin-approved frozen tool lists, preventing
silent server-side tool changes (inspired by ChatGPT's frozen snapshot
pattern).
New column on chats table:
- mcp_server_ids UUID[]: Per-chat MCP server selection, following the same
pattern as model_config_id (passed at chat creation, changeable per
message).
## API Endpoints
Admin endpoints (ResourceDeploymentConfig auth):
- GET/POST /api/experimental/chats/mcp-servers/
- PATCH/DELETE /api/experimental/chats/mcp-servers/{id}
- GET /api/experimental/chats/mcp-servers/{id}/tools
- POST /api/experimental/chats/mcp-servers/{id}/refresh-tools (501 placeholder)
## Governance Model
- Servers default to disabled (enabled=false) and default_off availability.
- force_on: Always injected, user cannot deselect.
- default_on: Pre-selected in new chats, user can deselect.
- default_off: Available in picker but not pre-selected.
- OAuth2 servers require per-user authentication before tools are injected.
- Tool-level allow/deny lists for granular control.
- Frozen tool snapshots require admin re-approval on server changes.
…points
- Rename mcp_server_configs.go → mcp.go
- Move routes from /chats/mcp-servers to /mcp/servers (adjacent to
existing /mcp/http transport endpoint)
- Update SDK client paths to match new route prefix
- Add user-facing endpoints:
- GET /mcp/servers/user — list enabled servers with per-user auth status
- GET /mcp/servers/{id}/oauth2/connect — initiate OAuth2 flow (redirect)
- GET /mcp/servers/{id}/oauth2/callback — exchange code for token, store
per-user, close popup
- DELETE /mcp/servers/{id}/oauth2/disconnect — revoke user's token
Route structure:
/api/experimental/mcp/
/servers/ — admin CRUD (GET, POST)
/servers/user — user list with auth status
/servers/{id}/ — admin CRUD (PATCH, DELETE)
/servers/{id}/tools — tool snapshot
/servers/{id}/oauth2/connect — start OAuth2
/servers/{id}/oauth2/callback — complete OAuth2
/servers/{id}/oauth2/disconnect — revoke token
/http — MCP HTTP transport (existing)
The /servers/user endpoint was redundant — the main GET /servers/ endpoint already branched on admin vs non-admin. Merged the user token lookup into the main handler so a single endpoint returns configs + auth_connected status for the calling user.
- Add 8 test functions covering CRUD, non-admin visibility, auth_connected field, availability validation, slug uniqueness, tool snapshots, OAuth2 disconnect, and per-chat MCP server ID persistence. - Rename codersdk/mcpserverconfigs.go → codersdk/mcp.go to match coderd/mcp.go. - Fix nil slice bug where omitted tool_allow_list/tool_deny_list in create requests would send NULL to NOT NULL columns.
Add rename entries in sqlc.yaml so generated Go types use idiomatic casing (MCPServerConfig, MCPServerIDs, etc.) instead of sqlc's default CamelCase (McpServerConfig, McpServerIds).
…eanup - Add dbcrypt encrypt/decrypt wrappers for all MCP server methods: - MCPServerConfig: OAuth2ClientSecret, APIKeyValue, CustomHeaders - MCPServerUserToken: AccessToken, RefreshToken - 8 decrypt-only wrappers for Get* methods - 3 encrypt+decrypt wrappers for Insert/Update/Upsert methods - Fix revive lint: split convertMCPServerConfig into two functions (convertMCPServerConfig + convertMCPServerConfigRedacted) instead of a single function with bool control flag - Implement CleanupDeletedMCPServerIDsFromChats in dbauthz (was panic stub) - Remove orphaned code from incomplete sed replacement - Remove non-existent convertMCPServerConfigForRole references
…iciency - Fix getMCPServerTools 500 for non-admin users: use AsSystemRestricted for both GetMCPServerConfigByID and GetActiveMCPServerToolSnapshot, and enforce Enabled check for non-admin callers. - Fix OAuth2 state cookie Secure flag: use HTTPCookies.Apply() from deployment config instead of r.TLS != nil, matching the pattern in userauth.go. Fixes CSRF cookie not working behind TLS-terminating reverse proxies. - Fix strings.Split on empty OAuth2Scopes producing [""] instead of []: guard with non-empty check before splitting. - Add AuthConnected population to GET-by-ID handler, matching the list handler's token lookup logic. - Remove omitempty from AuthConnected bool field so false is always serialized in JSON responses. - Narrow CleanupDeletedMCPServerIDsFromChats WHERE clause to only update chats that actually reference deleted config IDs, avoiding unnecessary WAL writes.
The frozen tool snapshot table is premature without the refresh-tools flow wired up. Tool filtering is handled by tool_allow_list and tool_deny_list on mcp_server_configs, which is sufficient for the initial implementation. Removes: - mcp_server_tool_snapshots table and its partial unique index - GetActiveMCPServerToolSnapshot, InsertMCPServerToolSnapshot, DeactivateMCPServerToolSnapshots queries - getMCPServerTools and refreshMCPServerTools handlers + routes - MCPServerTool, MCPServerToolSnapshot SDK types + client method - dbauthz/dbmetrics wrappers for removed queries - TestMCPServerConfigsToolSnapshot test
aac8787 to
cc74088
Compare
- Fix revive if-return lint in dbcrypt helpers: return decryptField directly instead of redundant if-err-return-nil pattern. - Fix timeout context used after t.Parallel: create per-subtest contexts in TestMCPServerConfigsAvailability.
cc74088 to
c22e304
Compare
johnstcn
left a comment
There was a problem hiding this comment.
Not done reviewing yet, but there's one blocker I'm seeing immediately.
| configs, err = api.Database.GetMCPServerConfigs(ctx) | ||
| } else { | ||
| //nolint:gocritic // All authenticated users need to read enabled MCP server configs to use the chat feature. | ||
| configs, err = api.Database.GetEnabledMCPServerConfigs(dbauthz.AsSystemRestricted(ctx)) |
There was a problem hiding this comment.
I'm extremely wary of this as we store encrypted fields in this table. I know that we elide them in convertMCPServerConfig below but this is one over-eager prompt away from exposing secrets.
It would be better to store the "secret" stuff in a separate table. It's a bit more work but then you can just tell dbauthz to allow everyone to read that, and lean on the owner role to protect the secrets.
There was a problem hiding this comment.
Valid concern. For now the converter functions (convertMCPServerConfig / convertMCPServerConfigRedacted) ensure secrets never reach API responses, and non-admin code paths use the redacted variant. I'll look into splitting the secret fields into a separate table as a follow-up — that would let dbauthz enforce the boundary at the DB layer rather than relying on application-level converters.
There was a problem hiding this comment.
For the moment, let's add a load-bearing unit test (or an assertion in an existing test) so that if this ever does happen, it won't do so silently.
There was a problem hiding this comment.
Added TestMCPServerConfigsSecretsNeverLeaked — creates a config with all secret fields populated (OAuth2 client secret, API key value, custom headers with sentinel values), then JSON-marshals the responses from every endpoint (admin create, admin list, admin get-by-ID, member list, member get-by-ID) and asserts none of the sentinel strings appear. Also verifies the Has* booleans are true and that non-admin responses have admin-only fields stripped (OAuth2ClientID, auth URLs, API key header, URL, Transport).
If anyone accidentally wires a secret field into the SDK type or converter, this test will catch it.
There was a problem hiding this comment.
🤖 mafredri sent me to read all 37 files and 3,500 new lines. My weights are exhausted but my findings are not.
Review: MCP server config backend with validation gaps, dead cleanup code, and thin test coverage for security-critical OAuth2 paths
Validation
go test ./coderd/ -run TestMCPServer -count=1 -v- Verify
CleanupDeletedMCPServerIDsFromChatsis called somewhere, or remove it - Test that a non-admin cannot call create/update/delete endpoints
- Test the OAuth2 connect → callback round-trip
Cross-cutting findings (not tied to a single line)
P1 CleanupDeletedMCPServerIDsFromChats is defined through all generated layers but never called anywhere. Stale UUIDs accumulate in chats.mcp_server_ids after MCP server config deletion. The FK cascade handles mcp_server_user_tokens, but chats.mcp_server_ids is a UUID array with no FK constraint and no code invokes the cleanup. (Gopher, DBA, Tester, Breaker, Mutex)
P1 The entire OAuth2 connect/callback flow (~200 lines across mcpServerOAuth2Connect and mcpServerOAuth2Callback) has zero test coverage. State cookie CSRF, code exchange, token storage, HTML popup response, and error branches are all untested. (Tester)
P1 The 195 lines of dbcrypt encrypt/decrypt wrappers in enterprise/dbcrypt/dbcrypt.go for MCP server configs and user tokens have no test coverage. An encrypt/decrypt field ordering bug would ship undetected. (Tester)
P1 Non-admin write operations (create, update, delete) are never tested for authorization rejection. TestMCPServerConfigsNonAdmin creates a memberClient but only calls the list endpoint. If the RBAC check on any write endpoint were removed, no test would catch it. (Tester)
P2 All user-token dbauthz wrappers (DeleteMCPServerUserToken, UpsertMCPServerUserToken, GetMCPServerUserTokensByUserID) gate on ResourceDeploymentConfig, but every call site bypasses with AsSystemRestricted. The authz layer provides zero protection for per-user token operations. If a future caller forgets AsSystemRestricted, a normal user is blocked from managing their own token. These should either use a per-user resource or be explicitly documented as system-only. (Gopher, Sentinel)
P2 AuthConnected=true after token storage is never tested; GET /servers/{id} endpoint is entirely untested (admin vs non-admin behavior, disabled-config filtering); partial update test asserts only 4 of 15+ fields. Non-admin list response is never checked for field redaction. (Tester)
P2 The icon_url: IconURL sqlc rename is globally scoped, causing unrelated changes to coderd/workspaceproxies.go and enterprise/coderd/workspaceproxy.go. Correct Go naming fix but out of scope for this PR and creates merge conflict risk. (Gopher)
P3 Phantom sqlc renames for mcp_server_tool_snapshot, mcp_server_tool_snapshots, tools_json reference types not introduced by this migration. Dead config for a future PR. (Gopher)
🤖 Seven reviewers, one JSON payload, zero regrets. Hejdå!
Fixes for P1 issues: - Validate mcp_server_ids against actual configs in postChats and postChatMessages, returning 400 if any ID doesn't exist. - Add cross-field validation on create: oauth2 requires client_id, auth_url, token_url; api_key requires header and value; custom_headers requires at least one header. - Clear stale auth fields when auth_type changes on update (e.g., switching from oauth2 to none clears all oauth2 secrets). - Check OAuth2 error query parameter in callback before looking for authorization code. - Return 500 on token lookup error in getMCPServerConfig instead of silently swallowing it. Fixes for P2 issues: - Verify auth_type==oauth2 in callback handler (not just connect). - Replace leaked err.Error() in token exchange with generic message. - Add validate:"omitempty,url" tags on OAuth2AuthURL/TokenURL. - Redact URL and Transport for non-admin callers. Fixes for P3 issues: - Add Content-Security-Policy header to OAuth2 callback HTML.
…ransaction - Use api.HTTPClient via context.WithValue(ctx, oauth2.HTTPClient, ...) for the OAuth2 token exchange instead of http.DefaultClient. This respects proxy settings and avoids SSRF with unrestricted client. - Wrap the update handler's read-modify-write in api.Database.InTx so concurrent PATCHes don't silently clobber each other. Request parsing and custom header validation stay outside the transaction.
TestMCPServerConfigsSecretsNeverLeaked creates a config with all secret fields populated (OAuth2 client secret, API key value, custom headers), then verifies via JSON marshaling that none of the sentinel secret values appear in responses from: - Admin create response - Admin list endpoint - Admin get-by-ID endpoint - Non-admin list endpoint (also checks admin fields are redacted) - Non-admin get-by-ID endpoint If a code change accidentally starts exposing secrets in API responses, this test will catch it.
mafredri
left a comment
There was a problem hiding this comment.
🤖 The neural network squinted at three commits and found closure. Almost.
All code findings from the initial review have been addressed. Verified the fixes for auth_type validation, stale field cleanup on type change, callback error handling, token error propagation, mcp_server_ids validation, SSRF/error leak, InTx on update, and redaction.
Once @johnstcn's three open items are resolved (nil MCPServerIDs crash on InsertChat, CleanupDeletedMCPServerIDsFromChats NULL bug + no caller, 15 missing dbauthz_test entries), this is good on our side.
Test coverage for the OAuth2 connect/callback flow, dbcrypt round-trip, and non-admin write rejection would be nice to have but doesn't block.
🤖 Scanning complete. Shutting down until the next
git push. — mafredri's mass of weights
…tests - Coalesce nil MCPServerIDs to empty slice in chatd.CreateChat and postChats handler, preventing SQL NULL on the NOT NULL column. This protects all callers including sub-agent chat creation paths. - Fix NULL propagation in CleanupDeletedMCPServerIDsFromChats: wrap array_agg subquery in COALESCE so cleanup works when all configs have been deleted (array_agg returns NULL on empty input). - Add 15 missing dbauthz MethodTestSuite entries for all MCP server querier methods, preventing CI failures from TearDownSuite.
- Remove outdated comment claiming GET-by-ID endpoint doesn't exist. - Make member get-by-ID assertions consistent with member list loop: assert all 7 redacted fields (OAuth2ClientID, OAuth2AuthURL, OAuth2TokenURL, OAuth2Scopes, APIKeyHeader, URL, Transport). - Add missing OAuth2Scopes assertion to member list loop.
mafredri
left a comment
There was a problem hiding this comment.
One last suggestion, otherwise LGTM on my end. But I'll defer to Cian for the dbcrypt stuff.
johnstcn
left a comment
There was a problem hiding this comment.
Just missing the dbcrypt internal test stuff. The code paths do already seem to be tested, but this is a good guard against future breakage. I don't need to review again.
Use COALESCE(@mcp_server_ids::uuid[], '{}'::uuid[]) in the INSERT
so callers that pass nil (Go zero value for []uuid.UUID) get an
empty array instead of SQL NULL. This fixes NOT NULL constraint
violations in existing tests that construct InsertChatParams
without the new mcp_server_ids field.
- Add comprehensive dbcrypt tests for all 11 MCP server methods: 8 subtests for config encrypt/decrypt (Insert, Get*, Update) and 3 subtests for user token encrypt/decrypt (Upsert, Get*). Each verifies plaintext on read and ciphertext in raw DB. - Improve invalid MCP server ID error message to list which specific IDs are invalid, in both postChats and postChatMessages. - Fix CleanupDeletedMCPServerIDsFromChats RBAC assertion to use ResourceChat (it modifies chats) instead of ResourceDeploymentConfig.
- Add //nolint:gosec directives for test credential constants. - Fix context.Context parameter ordering (must be first) in requireMCPServerConfigRawEncrypted helper. - Fix mangled const block and closing paren.
Add fixture file 000447_mcp_server_configs.up.sql with sample rows for both mcp_server_configs and mcp_server_user_tokens tables. This fixes the TestMigrateUpWithFixtures failure that requires all new tables to have fixture data.
Summary
Adds the database schema, API endpoints, SDK types, and encryption wrappers for admin-managed MCP (Model Context Protocol) server configurations that chatd can consume. This is the backend foundation for allowing external MCP tools (Sentry, Linear, GitHub, etc.) to be used during AI chat sessions.
Database
Two new tables:
mcp_server_configs: Admin-managed server definitions with URL, transport (Streamable HTTP / SSE), auth config (none / OAuth2 / API key / custom headers), tool allow/deny lists, and an availability policy (force_on/default_on/default_off). Includes CHECK constraints on transport, auth_type, and availability values.mcp_server_user_tokens: Per-user OAuth2 tokens for servers requiring individual authentication. Cascades on user/config deletion.New column on
chatstable:mcp_server_ids UUID[]: Per-chat MCP server selection, following the same pattern asmodel_config_id— passed at chat creation, changeable per-message with nil-means-no-change semantics.API Endpoints
All routes are under
/api/experimental/mcp/servers/and gated behind theagentsexperiment.Admin endpoints (
ResourceDeploymentConfigauth):POST /— Create MCP server configPATCH /{id}— Update MCP server config (full-replace)DELETE /{id}— Delete MCP server configAuthenticated endpoints (all users, enabled servers only for non-admins):
GET /— List configs (admins see all, members see enabled-only with admin fields redacted)GET /{id}— Get config by ID (withauth_connectedpopulated per-user)OAuth2 per-user auth flow:
GET /{id}/oauth2/connect— Initiate OAuth2 flow (state cookie CSRF protection)GET /{id}/oauth2/callback— Handle OAuth2 callback, store tokensDELETE /{id}/oauth2/disconnect— Remove stored OAuth2 tokensSecurity
OAuth2ClientSecret,APIKeyValue, andCustomHeadersare never in API responses — only boolean indicators (has_oauth2_secret,has_api_key,has_custom_headers).convertMCPServerConfigRedactedstripsOAuth2ClientID, auth URLs, scopes, andAPIKeyHeaderfrom non-admin responses.dbcrypt_keysencryption with full encrypt-on-write / decrypt-on-read wrappers (11 dbcrypt method overrides + 2 helpers), following the same pattern aschat_providers.api_key.HttpOnlycookie withHTTPCookies.Apply()for correctSecure/SameSitebehind TLS-terminating proxies.ActionRead, write operations useActionUpdateonResourceDeploymentConfig.Governance Model
enableddefaults tofalseforce_on(always injected),default_on(pre-selected),default_off(opt-in)mcp_server_idsonCreateChatRequest/CreateChatMessageRequestmcp_server_configsfor granular tool filteringdbcrypt_keys(same pattern aschat_providers.api_key)Tests
8 test functions covering:
auth_connectedpopulation for OAuth2 vs non-OAuth2 serversmcp_server_idspersistenceKnown Limitations (Deferred)
These are documented and intentional for an experimental feature:
auth_type=oauth2) — admin-only endpoint, will add when stabilizingforce_onauto-injection — query exists but not yet wired into chatd tool injection (follow-up)What's NOT in this PR
chatd/chatmcp/manager)chatloop/