Is your feature request related to a problem? Please describe.
Currently, incorporating multimodal input in my agent in jivas requires some customization. I'd like to see jivas has built in support for multi modal input.
Describe the solution you'd like
- Have a designated parameter for interact walker to send multimodal input (e.g. images) along side text utterance
- The images will be included in the final LLM call (e.g. via persona_interact_action)
- The images should be saved to the interaction node automatically and be picked up when constructing conversation history so the images are being considered in the conversation context for future requests.
Describe alternatives you've considered
We are currently implementing an alternative by directly interacting with the data field in interaction node and writing a custom version of the get_transcript_statements walker when getting the conversation history.
Is your feature request related to a problem? Please describe.
Currently, incorporating multimodal input in my agent in jivas requires some customization. I'd like to see jivas has built in support for multi modal input.
Describe the solution you'd like
Describe alternatives you've considered
We are currently implementing an alternative by directly interacting with the
datafield ininteractionnode and writing a custom version of theget_transcript_statementswalker when getting the conversation history.