What feature would you like to be added?
Support for multi-modal messages in a Function Tool Call response similarly to the MultiModal input message.
Why is this needed?
It allows agents to call tools that will return images e.g. part of a multi-modal rag scenario