Is your feature request related to a problem?
We want to input image/audio in our chat.
Describe the Solution you'd like
Supports Encoder; Modify Request and other runtime features; implement image understanding model definitions.
Alternatives Considered (Optional)
No response
Additional Context (Optional)
No response