feat: add file/image attachment support to chat input#22604
Conversation
| case string(codersdk.ChatInputPartTypeFile): | ||
| if part.MediaType == "" { | ||
| return nil, "", &codersdk.Response{ | ||
| Message: "Invalid input part.", | ||
| Detail: fmt.Sprintf("%s[%d].media_type is required for file parts.", fieldName, i), | ||
| } | ||
| } | ||
| if len(part.Data) == 0 { | ||
| return nil, "", &codersdk.Response{ | ||
| Message: "Invalid input part.", | ||
| Detail: fmt.Sprintf("%s[%d].data is required for file parts.", fieldName, i), | ||
| } | ||
| } | ||
| // Limit file size to 10 MB. | ||
| const maxFileSize = 10 << 20 | ||
| if len(part.Data) > maxFileSize { | ||
| return nil, "", &codersdk.Response{ | ||
| Message: "Invalid input part.", | ||
| Detail: fmt.Sprintf("%s[%d].data exceeds maximum size of 10 MB.", fieldName, i), | ||
| } | ||
| } | ||
| content = append(content, fantasy.FileContent{ | ||
| Data: part.Data, | ||
| MediaType: part.MediaType, | ||
| }) |
There was a problem hiding this comment.
So I was thinking that for our codersdk part we would instead store these in our files table instead, and when we do the conversion we'd actually pull the content or something.
A few reasons:
- Chats with lots of media will still load fast in the DB.
- If file uploads get really huge long-term, we might want to allow the agent to explore files using tools rather than embed them for models (e.g. grep a 100MB log file or something).
- If file uploads become a problem in our DB we could always make an adapter so they go into a bucket.
There was a problem hiding this comment.
Should we re-purpose files or create a new table? I'd favor the latter. Honestly the agent was supposed to create a table but opted for the easy path here, hah.
So should we bump the limit to 100MB for now or is 10MB fine for the time being?
There was a problem hiding this comment.
I think we should start with a lower size and see if it's a problem. I'm chill with a new table or re-purposing, whatever you think is best here.
## Changes - Removed the Coder Agents entry from the middle of the children array in `docs/manifest.json`. - Added the Coder Agents entry back at the end of the children array to improve the organization of the documentation structure.
coderd/chats.go
Outdated
| // @Tags Chats | ||
| // @Param chat path string true "Chat ID" format(uuid) | ||
| // @Success 201 {object} codersdk.UploadChatFileResponse | ||
| // @Router /chats/{chat}/files [post] |
There was a problem hiding this comment.
This won't work because users will want to create chats with files.
Maybe we allow a Files payload on the create chat endpoint? Not sure the best way to handle this :/
There was a problem hiding this comment.
Good point, so we want files that are initially detached from a chat and later associated with one. For good UX we want to start the upload immediately IMO so they don't have to wait after posting/creating the chat.
…2614) ## Changes - Removed the Coder Agents entry from the middle of the children array in `docs/manifest.json`. - Added the Coder Agents entry back at the end of the children array to improve the organization of the documentation structure. <img width="368" height="688" alt="image" src="https://github.com/user-attachments/assets/3117acfd-8c8a-4522-84e7-a748a7596cc6" /> <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->
bb4bd38 to
be982f1
Compare
Documentation CheckNew Documentation Needed
Automated review via Coder Tasks |
| // detectChatFileType detects the MIME type of the given data. | ||
| // It extends http.DetectContentType with support for WebP, which | ||
| // Go's standard sniffer does not recognize. | ||
| func detectChatFileType(data []byte) string { |
There was a problem hiding this comment.
This is a good candidate for coderd/util and perhaps some dedicated tests.
There was a problem hiding this comment.
Good idea, although it's tiny enough and only used here, so I decided to keep it contained here for now. Definitely if we find more use for it though 👍🏻
johnstcn
left a comment
There was a problem hiding this comment.
Orphaned file cleanup. Files uploaded but never referenced (user navigates away, upload error, etc.) persist until the user is deleted (ON DELETE CASCADE). A TTL-based cleanup job or reference tracking would address this.
I'm OK to leave this out of the current impl but we should address this sooner rather than later. dbpurge can probably handle this.
3705adb to
25d1bd0
Compare
coderd/database/dbauthz/dbauthz.go
Outdated
| if err != nil { | ||
| return database.ChatFile{}, err | ||
| } | ||
| if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceChat.WithOwner(file.OwnerID.String()).InOrg(file.OrganizationID).WithID(file.ID)); err != nil { |
There was a problem hiding this comment.
You can just make database.ChatFile implement rbac.Objecter
There was a problem hiding this comment.
Yep, I agree that's the best approach 👍🏻
| SELECT organization_members.organization_id | ||
| FROM organization_members | ||
| WHERE organization_members.user_id = chat_files.owner_id | ||
| LIMIT 1 |
There was a problem hiding this comment.
What's the ordering situation here with organization memberships?
There was a problem hiding this comment.
It doesn't really matter in this case since the data won't exist. I'm just using separate migrations to avoid nuking my test DB.
| var allowedChatFileMIMETypes = map[string]bool{ | ||
| "image/png": true, | ||
| "image/jpeg": true, | ||
| "image/gif": true, | ||
| "image/webp": true, | ||
| "image/svg+xml": false, // SVG can contain scripts. | ||
| } |
There was a problem hiding this comment.
Presumably each model provider has a different type of allowed mimes right? I'm not sure how exactly we should do this. I wonder if any Charm libraries have a mapping of provider/model -> file types. It'd be super slick if later on we could show an exclamation mark in the chat on the attachments if a user switches to a model that does not support them.
Instead of erroring, we can just clearly indicate in the UI that we are not sending it because it is not supported.
There was a problem hiding this comment.
I'll file a follow-up ticket for this 👍🏻
coderd/chats.go
Outdated
| // TODO(multi-org): This takes the first org, which is correct for | ||
| // single-org deployments but arbitrary for multi-org. The client | ||
| // should specify the org, or we should use a default org policy. |
There was a problem hiding this comment.
Should we not just require an organization ID sent now so we don't have to change the API contract later? I know the same could be said for chats themselves, but gotta start somewhere.
| chatFile, err := db.GetChatFileByID(ctx, part.FileID) | ||
| if err != nil { | ||
| if httpapi.Is404Error(err) { | ||
| return nil, "", &codersdk.Response{ | ||
| Message: "Invalid input part.", | ||
| Detail: fmt.Sprintf("%s[%d].file_id references a file that does not exist.", fieldName, i), | ||
| } | ||
| } | ||
| return nil, "", &codersdk.Response{ | ||
| Message: "Internal error.", | ||
| Detail: fmt.Sprintf("Failed to retrieve file for %s[%d].", fieldName, i), | ||
| } | ||
| } | ||
| content = append(content, fantasy.FileContent{ | ||
| Data: chatFile.Data, | ||
| MediaType: chatFile.Mimetype, | ||
| }) |
There was a problem hiding this comment.
I wonder how slow this will make chats with lots of attachments. Maybe it's not a biggie, because we do have summarization which trims the greater context.
There was a problem hiding this comment.
It would be interesting to track somehow, but not sure how, or how we'd avoid it here.
There was a problem hiding this comment.
One option to reduce context size would be to compress/resize the image I suppose, I think this needs some user-control though so not sure how we'd do it. But can document as a future enhancement?
…on send Previously, files that failed to upload were silently dropped when sending a message. Users had no indication their attachments were skipped. Now both send handlers count error-state files and show a toast warning with the number of attachments that could not be sent.
Verifies that editing a chat message preserves the file_id through both the edit response and a subsequent GET. This covers the write path (MarshalContent with injected file IDs) and the read path (fileStoredID extraction and chatMessageParts population).
…action Previously there was no test coverage for file_id extraction from file blocks in parseMessageContent. This adds two cases: one verifying fileId is populated when file_id is present, and one verifying backward compatibility when file_id is absent.
Previously every chat message API response included the full base64
file data inline in ChatMessagePart.Data, even though clients can
fetch file content via GET /chats/files/{id} with caching.
Nil out part.Data when part.FileID is set so the JSON response omits
the data field (via omitempty). MediaType is preserved for client-side
display hints.
Previously FileID was uuid.UUID which serialized the zero value as "00000000-0000-0000-0000-000000000000" on non-file parts because omitempty does not work for fixed-size arrays. NullUUID serializes as null when invalid and is properly omitted with omitempty for pointer variants.
…pload After a successful upload, fetch the file's GET endpoint in the background. The server responds with Cache-Control: private, immutable, so when the timeline later renders <img src=...> for the same file_id, the browser serves it from its HTTP cache with no network request.
Sniff the first 512 bytes to verify the content type before reading the entire body. This avoids buffering up to 10MB of data for files that will be rejected by the MIME type check.
MarshalContent takes a plain map parameter instead of varargs. fileStoredID returns uuid.NullUUID with parsing handled internally. Part type checks in chatMessageParts use a switch.
49855d7 to
530e0c6
Compare
Add organization_id index to chat_files migration so that org deletes do not require a sequential scan. Return 413 instead of 400 when MaxBytesReader triggers on file upload by detecting http.MaxBytesError. Remove DB error detail from the InsertChatFile 500 response to avoid leaking internal information. Fix UTF-8 filename truncation to avoid splitting multi-byte characters by iterating runes. Clear image data bytes before persisting the content block JSON in injectFileID since only the file_id reference is needed.
…cessible Previously the remove button used `hidden group-hover:flex` which removed it from the accessibility tree entirely, making it unreachable for keyboard and touch users. Switch to opacity-based visibility so the button stays in the DOM (accessible to screen readers and keyboard navigation) but is visually hidden until hover or focus.
Add a batch query to fetch multiple chat files by IDs in a single round-trip. This will be used by the chat prompt builder to resolve file references at LLM dispatch time instead of storing inline data.
e5406df to
a60e740
Compare
Add ConvertMessagesWithFiles that resolves file references from chat_files storage at LLM dispatch time. Previously file data was stored inline in both chat_files.data and chat_messages.content (base64 in JSON), doubling storage for each uploaded file. New messages now store only file_id + media_type in the content JSON (injectFileID strips inline data). The actual bytes are resolved via a FileResolver callback when building the LLM prompt. ConvertMessages delegates to ConvertMessagesWithFiles with a nil resolver, preserving backward compatibility for all existing callers and messages that still have inline data.
a60e740 to
5b404a6
Compare
Replace ConvertMessages calls at both LLM dispatch points with ConvertMessagesWithFiles, passing a FileResolver that fetches file bytes from chat_files via GetChatFilesByIDs. This ensures file data is resolved from the authoritative storage rather than relying on inline base64 in the message JSON.
Add three test cases: - ResolvesFileData: verifies ConvertMessagesWithFiles resolves file_id references via the FileResolver callback - BackwardCompat: verifies messages with inline data still work when no resolver is provided - StripsInlineData: verifies MarshalContent + injectFileID produces JSON with file_id but no inline data
Implement the dbauthz authorization wrapper for GetChatFilesByIDs that was left as a panic stub by make gen. Follows the same pattern as GetChatFileByID: fetch files, then authorize each one.
5b404a6 to
3452907
Compare
File data is resolved at LLM dispatch time via chatFileResolver. The fetch here only validates existence and gets the media type.


Adds image attachment support to chat via an attach button and clipboard paste (
Ctrl+V/Cmd+Vfor screenshots). Works on both the create page and within existing chats, including image-only messages with no text.Design
Files are stored in a new
chat_filestable asbytea, owner-scoped with no foreign key tochats. This decoupling lets users upload immediately on attach (before a chat even exists) for responsive UX, and makes it straightforward to swap to object storage later since the table is an implementation detail behind two endpoints.ChatInputPartgains afiletype with afile_idreference. At message creation time, the backend validates file IDs and records only thefile_id+media_typein the stored message JSON (no inline image data). File content is resolved fromchat_filesat LLM dispatch time via a batch query (GetChatFilesByIDs), so the message content column stays small regardless of attachment count or size.The upload endpoint peeks at the first 512 bytes via
bufio.Readerto sniff the content type before reading the full body, rejecting invalid MIME types without buffering up to 10MB. It validates against an allowlist (png, jpeg, gif, webp; SVG explicitly blocked) usinghttp.DetectContentTypeextended with a custom WebP sniffer. Oversized uploads get a proper 413 status. The retrieval endpoint serves files withContent-Disposition: inline,Cache-Control: private, immutable, andContent-Length.On the frontend, files upload eagerly on attach (not at send time) so users see immediate progress. After upload completes, a background fetch pre-warms the browser HTTP cache for the file endpoint so the timeline renders instantly after send. Per-file upload state tracks
uploading/uploaded/errorwith inline indicators on thumbnails. A toast warns when failed attachments are dropped on send. Edit mode reuses existingfile_idreferences without re-uploading. File parts render as<img>tags pointing to/api/experimental/chats/files/{id}, served from the browser HTTP cache.What this PR does not address
ON DELETE CASCADE). A TTL-based cleanup job or reference tracking would address this.accept="image/*".GetChatFileByIDat message creation time fetches the full row including data just to validate existence. A metadata-only query would avoid loading the blob into memory.