Skip to content

[Feature Request]: Built-in multimodal input support throughout jivas #142

@ypkang

Description

@ypkang

Is your feature request related to a problem? Please describe.
Currently, incorporating multimodal input in my agent in jivas requires some customization. I'd like to see jivas has built in support for multi modal input.

Describe the solution you'd like

  1. Have a designated parameter for interact walker to send multimodal input (e.g. images) along side text utterance
  2. The images will be included in the final LLM call (e.g. via persona_interact_action)
  3. The images should be saved to the interaction node automatically and be picked up when constructing conversation history so the images are being considered in the conversation context for future requests.

Describe alternatives you've considered
We are currently implementing an alternative by directly interacting with the data field in interaction node and writing a custom version of the get_transcript_statements walker when getting the conversation history.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions