Skip to content

Conversation

@sanchitmalhotra126
Copy link
Contributor

This is a somewhat experimental, fully client-side, TypeScript re-implementation of the AI Chat API client. This was primarily done to support image generation for an upcoming classroom pilot, but offered us an opportunity to explore the benefits and potential of moving away from our Rails backend implementation.

This implementation makes use of the AI SDK library, which eliminates the need for model/provider specific code, and provides a bunch of useful types out of the box. The code here effectively does everything our backend currently does, and a bit more, including:

  • Checking input/output for safety with an LLM
  • Supporting multimodal input
  • (new) Supporting multimodal output (with eligible models, currently just gemini 2.5 flash-image)
  • Downloading user assets into binary data before sending to the LLM endpoint
  • (new) Uploading generated files into the user's project directory, and rendering them in the chat interface

Note that this is still primary intended for local developer use/testing only. Notably, this implementation relies on hardcoded client-side API keys (that local developers can paste in). For production use, we'll be exploring short-term tokens and/or an API proxy layer to ensure school firewalls aren't blocking requests. The purposes of this PR is to lay out everything else in this client library implementation.

Thanks @edcodedotorg and @snickell for pointing me in this direction!

File Walkthrough

Here's a quick walkthrough of the files (I know this diff is a bit chunky, but I figured folks might benefit from seeing the full picture end to end):

  • performChatCompletion: outermost function (effectively the "public API") which is the analog to the existing postAichatCompletionMessage we use today. The method signatures are intentionally similar. This function creates a new AichatRequest object** (why? see below), and delegates to the generateChatResponse to do all the actual chat completion work. Returns an array of two chat messages, the updated user message and assistant response, like our existing API function.
  • generateChatResponse: does all the things to generate a chat response - 1) checks input for safety, 2) transforms messages and system content into the AI-SDK format, 3) calls model to generate a response, 4) checks model text output for safety, 5) uploads any generated assets to the project directory, 6) returns generated response, assets, and status code. Very similar to what happens in aichat_request_chat_completion_job. Invokes various helpers for subtasks.
  • messageHelpers & fileHelpers: help transform AI Chat-specific types into AI SDK-specific types. Important callout here is that we download project/level assets into binary strings to send to the model, like we do in Ruby code today.
  • safetyHelpers: safety related stuff. exports one function isTextSafe, and uses the same structured output schema and safety prompt we use in Ruby code today. I did create a SafetyConfig type that we could use for more modular safety prompt generation later on.
    • Note: one change here is that we're not modifying the safety prompt based on the script language, like we do today (substituting "Spanish" for "American" in "American public middle school classroom"). I think there might be some more elegant options for context-based modular prompt construction here, that we can explore separately.
  • modelHelpers: model/provider-specific stuff. this just creates the model objects that the AI SDK library uses.
  • **aichatRequestHelpers: helpers for creating an AichatRequest record (without kicking off a job) and updating a record. We need this only because the AichatEvent record that later gets created for each new ChatEvent has a foreign key dependency on an existing AichatRequest. My understanding is that this is really used for data export/analytics purposes (i.e., if there was a specific event where the model generated profanity, and we want to link back to the AichatRequest record that resulted in this to learn more about the conversation history up to this point, etc). I think we can eventually get rid of this if we find another way to satisfy these analytics requirements, and remove this code altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant