Skip to content

feat: add file/image attachment support to chat input#22604

Merged
mafredri merged 28 commits intomainfrom
mafredri/chat-file-attachments
Mar 6, 2026
Merged

feat: add file/image attachment support to chat input#22604
mafredri merged 28 commits intomainfrom
mafredri/chat-file-attachments

Conversation

@mafredri
Copy link
Copy Markdown
Member

@mafredri mafredri commented Mar 4, 2026

Adds image attachment support to chat via an attach button and clipboard paste (Ctrl+V / Cmd+V for screenshots). Works on both the create page and within existing chats, including image-only messages with no text.

Design

Files are stored in a new chat_files table as bytea, owner-scoped with no foreign key to chats. This decoupling lets users upload immediately on attach (before a chat even exists) for responsive UX, and makes it straightforward to swap to object storage later since the table is an implementation detail behind two endpoints.

ChatInputPart gains a file type with a file_id reference. At message creation time, the backend validates file IDs and records only the file_id + media_type in the stored message JSON (no inline image data). File content is resolved from chat_files at LLM dispatch time via a batch query (GetChatFilesByIDs), so the message content column stays small regardless of attachment count or size.

The upload endpoint peeks at the first 512 bytes via bufio.Reader to sniff the content type before reading the full body, rejecting invalid MIME types without buffering up to 10MB. It validates against an allowlist (png, jpeg, gif, webp; SVG explicitly blocked) using http.DetectContentType extended with a custom WebP sniffer. Oversized uploads get a proper 413 status. The retrieval endpoint serves files with Content-Disposition: inline, Cache-Control: private, immutable, and Content-Length.

On the frontend, files upload eagerly on attach (not at send time) so users see immediate progress. After upload completes, a background fetch pre-warms the browser HTTP cache for the file endpoint so the timeline renders instantly after send. Per-file upload state tracks uploading/uploaded/error with inline indicators on thumbnails. A toast warns when failed attachments are dropped on send. Edit mode reuses existing file_id references without re-uploading. File parts render as <img> tags pointing to /api/experimental/chats/files/{id}, served from the browser HTTP cache.

What this PR does not address

  • Orphaned file cleanup. Files uploaded but never referenced persist until the user is deleted (ON DELETE CASCADE). A TTL-based cleanup job or reference tracking would address this.
  • Non-image file types. The backend allowlist is image-only. The frontend restricts to accept="image/*".
  • Per-user storage quotas. There is a rate limit but no total storage cap per user.
  • Upload cancellation. Removing an attachment mid-upload does not abort the in-flight request.
  • Max file parts per message. No limit on the number of file parts in a single message. A cap would bound the batch resolution cost at LLM dispatch.
  • Tool-based file exploration. For larger file types (logs, code), it may be better to let the agent explore files via tools (e.g., grep, read ranges) rather than embedding the full content in the model context. The current design always inlines file data for the LLM.
  • Per-model MIME support. Each LLM provider accepts different file types. The allowlist is currently global. Surfacing per-model support in the UI (e.g., indicating an attachment will not be sent when switching to a model that does not support it) would improve UX.
  • Metadata-only file validation. GetChatFileByID at message creation time fetches the full row including data just to validate existence. A metadata-only query would avoid loading the blob into memory.

Comment on lines +2032 to +2056
case string(codersdk.ChatInputPartTypeFile):
if part.MediaType == "" {
return nil, "", &codersdk.Response{
Message: "Invalid input part.",
Detail: fmt.Sprintf("%s[%d].media_type is required for file parts.", fieldName, i),
}
}
if len(part.Data) == 0 {
return nil, "", &codersdk.Response{
Message: "Invalid input part.",
Detail: fmt.Sprintf("%s[%d].data is required for file parts.", fieldName, i),
}
}
// Limit file size to 10 MB.
const maxFileSize = 10 << 20
if len(part.Data) > maxFileSize {
return nil, "", &codersdk.Response{
Message: "Invalid input part.",
Detail: fmt.Sprintf("%s[%d].data exceeds maximum size of 10 MB.", fieldName, i),
}
}
content = append(content, fantasy.FileContent{
Data: part.Data,
MediaType: part.MediaType,
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was thinking that for our codersdk part we would instead store these in our files table instead, and when we do the conversion we'd actually pull the content or something.

A few reasons:

  1. Chats with lots of media will still load fast in the DB.
  2. If file uploads get really huge long-term, we might want to allow the agent to explore files using tools rather than embed them for models (e.g. grep a 100MB log file or something).
  3. If file uploads become a problem in our DB we could always make an adapter so they go into a bucket.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we re-purpose files or create a new table? I'd favor the latter. Honestly the agent was supposed to create a table but opted for the easy path here, hah.

So should we bump the limit to 100MB for now or is 10MB fine for the time being?

Copy link
Copy Markdown
Member

@kylecarbs kylecarbs Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should start with a lower size and see if it's a problem. I'm chill with a new table or re-purposing, whatever you think is best here.

mattvollmer added a commit that referenced this pull request Mar 4, 2026
## Changes

- Removed the Coder Agents entry from the middle of the children array in `docs/manifest.json`.
- Added the Coder Agents entry back at the end of the children array to improve the organization of the documentation structure.
coderd/chats.go Outdated
// @Tags Chats
// @Param chat path string true "Chat ID" format(uuid)
// @Success 201 {object} codersdk.UploadChatFileResponse
// @Router /chats/{chat}/files [post]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work because users will want to create chats with files.

Maybe we allow a Files payload on the create chat endpoint? Not sure the best way to handle this :/

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, so we want files that are initially detached from a chat and later associated with one. For good UX we want to start the upload immediately IMO so they don't have to wait after posting/creating the chat.

mattvollmer added a commit that referenced this pull request Mar 4, 2026
…2614)

## Changes

- Removed the Coder Agents entry from the middle of the children array
in `docs/manifest.json`.
- Added the Coder Agents entry back at the end of the children array to
improve the organization of the documentation structure.

<img width="368" height="688" alt="image"
src="https://github.com/user-attachments/assets/3117acfd-8c8a-4522-84e7-a748a7596cc6"
/>


<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->
@mafredri mafredri force-pushed the mafredri/chat-file-attachments branch 7 times, most recently from bb4bd38 to be982f1 Compare March 5, 2026 11:30
@mafredri mafredri marked this pull request as ready for review March 5, 2026 11:55
@coder-tasks
Copy link
Copy Markdown
Contributor

coder-tasks bot commented Mar 5, 2026

Documentation Check

New Documentation Needed

  • docs/ai-coder/agents/index.md – Document the new image/file attachment feature: how to attach images via the paperclip button, supported formats (PNG, JPEG, GIF, WebP), and pasting screenshots from the clipboard (Ctrl+V / Cmd+V). The current agents docs describe the chat interface but make no mention of attachment capabilities.

Automated review via Coder Tasks

// detectChatFileType detects the MIME type of the given data.
// It extends http.DetectContentType with support for WebP, which
// Go's standard sniffer does not recognize.
func detectChatFileType(data []byte) string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good candidate for coderd/util and perhaps some dedicated tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, although it's tiny enough and only used here, so I decided to keep it contained here for now. Definitely if we find more use for it though 👍🏻

Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orphaned file cleanup. Files uploaded but never referenced (user navigates away, upload error, etc.) persist until the user is deleted (ON DELETE CASCADE). A TTL-based cleanup job or reference tracking would address this.

I'm OK to leave this out of the current impl but we should address this sooner rather than later. dbpurge can probably handle this.

@mafredri mafredri force-pushed the mafredri/chat-file-attachments branch from 3705adb to 25d1bd0 Compare March 5, 2026 14:07
if err != nil {
return database.ChatFile{}, err
}
if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceChat.WithOwner(file.OwnerID.String()).InOrg(file.OrganizationID).WithID(file.ID)); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just make database.ChatFile implement rbac.Objecter

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree that's the best approach 👍🏻

Comment on lines +4 to +7
SELECT organization_members.organization_id
FROM organization_members
WHERE organization_members.user_id = chat_files.owner_id
LIMIT 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the ordering situation here with organization memberships?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter in this case since the data won't exist. I'm just using separate migrations to avoid nuking my test DB.

Comment on lines +2002 to +2008
var allowedChatFileMIMETypes = map[string]bool{
"image/png": true,
"image/jpeg": true,
"image/gif": true,
"image/webp": true,
"image/svg+xml": false, // SVG can contain scripts.
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably each model provider has a different type of allowed mimes right? I'm not sure how exactly we should do this. I wonder if any Charm libraries have a mapping of provider/model -> file types. It'd be super slick if later on we could show an exclamation mark in the chat on the attachments if a user switches to a model that does not support them.

Instead of erroring, we can just clearly indicate in the UI that we are not sending it because it is not supported.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file a follow-up ticket for this 👍🏻

coderd/chats.go Outdated
Comment on lines +2053 to +2055
// TODO(multi-org): This takes the first org, which is correct for
// single-org deployments but arbitrary for multi-org. The client
// should specify the org, or we should use a default org policy.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not just require an organization ID sent now so we don't have to change the API contract later? I know the same could be said for chats themselves, but gotta start somewhere.

Comment on lines +2220 to +2236
chatFile, err := db.GetChatFileByID(ctx, part.FileID)
if err != nil {
if httpapi.Is404Error(err) {
return nil, "", &codersdk.Response{
Message: "Invalid input part.",
Detail: fmt.Sprintf("%s[%d].file_id references a file that does not exist.", fieldName, i),
}
}
return nil, "", &codersdk.Response{
Message: "Internal error.",
Detail: fmt.Sprintf("Failed to retrieve file for %s[%d].", fieldName, i),
}
}
content = append(content, fantasy.FileContent{
Data: chatFile.Data,
MediaType: chatFile.Mimetype,
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how slow this will make chats with lots of attachments. Maybe it's not a biggie, because we do have summarization which trims the greater context.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to track somehow, but not sure how, or how we'd avoid it here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option to reduce context size would be to compress/resize the image I suppose, I think this needs some user-control though so not sure how we'd do it. But can document as a future enhancement?

Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend LGTM

@kylecarbs
Copy link
Copy Markdown
Member

UI improvements:

image

We should make it use https://lucide.dev/icons/image instead. And we should put it to the left of the send icon (when the context indicator is visible, it should be to the right of that).

image

When an image fails to upload, we should have an error message on hover in a tooltip.

mafredri added 11 commits March 6, 2026 16:44
…on send

Previously, files that failed to upload were silently dropped when
sending a message. Users had no indication their attachments were
skipped.

Now both send handlers count error-state files and show a toast
warning with the number of attachments that could not be sent.
Verifies that editing a chat message preserves the file_id through
both the edit response and a subsequent GET. This covers the write
path (MarshalContent with injected file IDs) and the read path
(fileStoredID extraction and chatMessageParts population).
…action

Previously there was no test coverage for file_id extraction from file
blocks in parseMessageContent. This adds two cases: one verifying fileId
is populated when file_id is present, and one verifying backward
compatibility when file_id is absent.
Previously every chat message API response included the full base64
file data inline in ChatMessagePart.Data, even though clients can
fetch file content via GET /chats/files/{id} with caching.

Nil out part.Data when part.FileID is set so the JSON response omits
the data field (via omitempty). MediaType is preserved for client-side
display hints.
Previously FileID was uuid.UUID which serialized the zero value as
"00000000-0000-0000-0000-000000000000" on non-file parts because
omitempty does not work for fixed-size arrays. NullUUID serializes
as null when invalid and is properly omitted with omitempty for
pointer variants.
…pload

After a successful upload, fetch the file's GET endpoint in the
background. The server responds with Cache-Control: private,
immutable, so when the timeline later renders <img src=...> for the
same file_id, the browser serves it from its HTTP cache with no
network request.
Sniff the first 512 bytes to verify the content type before reading
the entire body. This avoids buffering up to 10MB of data for files
that will be rejected by the MIME type check.
MarshalContent takes a plain map parameter instead of varargs.
fileStoredID returns uuid.NullUUID with parsing handled internally.
Part type checks in chatMessageParts use a switch.
@mafredri mafredri force-pushed the mafredri/chat-file-attachments branch 2 times, most recently from 49855d7 to 530e0c6 Compare March 6, 2026 16:46
mafredri added 3 commits March 6, 2026 17:20
Add organization_id index to chat_files migration so that org
deletes do not require a sequential scan.

Return 413 instead of 400 when MaxBytesReader triggers on file
upload by detecting http.MaxBytesError.

Remove DB error detail from the InsertChatFile 500 response to
avoid leaking internal information.

Fix UTF-8 filename truncation to avoid splitting multi-byte
characters by iterating runes.

Clear image data bytes before persisting the content block JSON
in injectFileID since only the file_id reference is needed.
…cessible

Previously the remove button used `hidden group-hover:flex` which
removed it from the accessibility tree entirely, making it unreachable
for keyboard and touch users.

Switch to opacity-based visibility so the button stays in the DOM
(accessible to screen readers and keyboard navigation) but is visually
hidden until hover or focus.
Add a batch query to fetch multiple chat files by IDs in a single
round-trip. This will be used by the chat prompt builder to resolve
file references at LLM dispatch time instead of storing inline data.
@mafredri mafredri force-pushed the mafredri/chat-file-attachments branch 3 times, most recently from e5406df to a60e740 Compare March 6, 2026 17:47
Add ConvertMessagesWithFiles that resolves file references from
chat_files storage at LLM dispatch time. Previously file data was
stored inline in both chat_files.data and chat_messages.content
(base64 in JSON), doubling storage for each uploaded file.

New messages now store only file_id + media_type in the content JSON
(injectFileID strips inline data). The actual bytes are resolved via
a FileResolver callback when building the LLM prompt.

ConvertMessages delegates to ConvertMessagesWithFiles with a nil
resolver, preserving backward compatibility for all existing callers
and messages that still have inline data.
@mafredri mafredri force-pushed the mafredri/chat-file-attachments branch from a60e740 to 5b404a6 Compare March 6, 2026 17:50
mafredri added 3 commits March 6, 2026 17:50
Replace ConvertMessages calls at both LLM dispatch points with
ConvertMessagesWithFiles, passing a FileResolver that fetches file
bytes from chat_files via GetChatFilesByIDs. This ensures file data
is resolved from the authoritative storage rather than relying on
inline base64 in the message JSON.
Add three test cases:
- ResolvesFileData: verifies ConvertMessagesWithFiles resolves
  file_id references via the FileResolver callback
- BackwardCompat: verifies messages with inline data still work
  when no resolver is provided
- StripsInlineData: verifies MarshalContent + injectFileID
  produces JSON with file_id but no inline data
Implement the dbauthz authorization wrapper for GetChatFilesByIDs
that was left as a panic stub by make gen. Follows the same pattern
as GetChatFileByID: fetch files, then authorize each one.
@mafredri mafredri force-pushed the mafredri/chat-file-attachments branch from 5b404a6 to 3452907 Compare March 6, 2026 17:50
File data is resolved at LLM dispatch time via chatFileResolver.
The fetch here only validates existence and gets the media type.
@mafredri mafredri merged commit a104d60 into main Mar 6, 2026
37 checks passed
@mafredri mafredri deleted the mafredri/chat-file-attachments branch March 6, 2026 19:05
@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants