[Feature Request]: Built-in multimodal input support throughout jivas

**Is your feature request related to a problem? Please describe.**
Currently, incorporating multimodal input in my agent in jivas requires some customization. I'd like to see jivas has built in support for multi modal input.

**Describe the solution you'd like**
1. Have a designated parameter for interact walker to send multimodal input (e.g. images) along side text utterance
2. The images will be included in the final LLM call (e.g. via persona_interact_action)
3. The images should be saved to the interaction node automatically and be picked up when constructing conversation history so the images are being considered in the conversation context for future requests.
 
**Describe alternatives you've considered**
We are currently implementing an alternative by directly interacting with the `data` field in `interaction` node and writing a custom version of the `get_transcript_statements` walker when getting the conversation history. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Built-in multimodal input support throughout jivas #142

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request]: Built-in multimodal input support throughout jivas #142

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions