-
-
Notifications
You must be signed in to change notification settings - Fork 111
feat: add multimodal UIMessage support #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
π WalkthroughWalkthroughThis PR adds multimodal message part support to TanStack AI, introducing four new message part types (image, audio, video, document) with corresponding type definitions and conversion logic that preserves multimodal content when transforming between UIMessage and ModelMessage representations. Changes
Sequence DiagramsequenceDiagram
participant Client as Client (useChat)
participant UIMsg as UIMessage Builder
participant Converter as Message Converter
participant Model as ModelMessage
rect rgba(100, 200, 150, 0.5)
Note over Client,Model: Append Multimodal Content
Client->>UIMsg: append({ role, content: [text, image, ...] })
UIMsg->>Converter: uiMessageToModelMessages(uiMessage)
activate Converter
Converter->>Converter: Detect multimodal parts (image, audio, video, doc)
Converter->>Model: Create ModelMessage with ContentPart[]
deactivate Converter
Model->>Model: content: [TextPart, ImagePart, ...] (preserved)
end
rect rgba(150, 150, 200, 0.5)
Note over Model,UIMsg: Receive Multimodal Response
Model->>Converter: modelMessageToUIMessage(modelMessage)
activate Converter
Converter->>Converter: Detect ContentPart[] in content
Converter->>Converter: contentPartsToMessageParts(parts[])
deactivate Converter
Converter->>UIMsg: Reconstruct UIMessage with multimodal parts
UIMsg->>Client: Update chat with [text, image, audio, video, doc] parts
end
Estimated code review effortπ― 3 (Moderate) | β±οΈ ~30 minutes Suggested reviewers
Poem
π₯ Pre-merge checks | β 5β Passed checks (5 passed)
βοΈ Tip: You can configure your own custom pre-merge checks in the settings. β¨ Finishing touches
Warning Review ran into problemsπ₯ ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
β Actions performedReview triggered.
|
π― Changes
When calling
append()with aModelMessagecontaining multimodal content (images, audio, files), the content was stripped during theModelMessage β UIMessageconversion becausemodelMessageToUIMessage()only extracted text viagetTextContent(). Along this, thepartsof a message doesn't include multimodal parts, making it impossible to build chat UIs that preserve and display multimodal content.Added new message part types and updated the conversion functions to preserve multimodal content during round-trips:
New Types (@tanstack/ai and @tanstack/ai-client):
ImageMessagePart- preserves image data with source and optional metadataAudioMessagePart- preserves audio dataVideoMessagePart- preserves video data - (NOT TESTED)DocumentMessagePart- preserves document data (e.g., PDFs) - (NOT TESTED)Updated Conversion Functions:
modelMessageToUIMessage()- now convertsContentPart[]to correspondingMessagePart[]instead of discarding non-text partsuiMessageToModelMessages()- now buildsContentPart[]when multimodal parts are present, preserving part orderingExample:
Demo
Images:
https://github.com/user-attachments/assets/5f62ab32-9f11-44f7-bfc0-87d00678e265
Audio:
https://github.com/user-attachments/assets/bbbdc2f9-f8d7-4d74-99c2-23d15a3278a3
Closes #200
Note
This contribution touches core message handling. Let me know if the approach doesn't align with the project's vision, I am happy to iterate on it :)
This PR is not ready to be merged because:
β Checklist
pnpm run test:pr.π Release Impact
Summary by CodeRabbit
New Features
Tests
βοΈ Tip: You can customize this high-level summary in your review settings.