Conversation
Message is defined as an object with a role and content, both of which are strings. But there are examples throughout the codebase, when using a vision model, showing that you can pass an array of objects with a type property of "string" or "image" and a text property that is a string. This expands the JSDoc typedef of Message to cover both cases.
nico-martin
left a comment
There was a problem hiding this comment.
Hi @philnash, great to see you here!
You're bringing up a pretty important issue there!
Traditionally the Message was only used for the TextGeneration, so text only. But as you correctly pointed out, we could also use { type: string, text?: string }[].
However, I think we will see more multimodal models in the future, which is why we need to find a clean solution here.
Could you channge the type to { type: "text", text: string }[] so we can merge it?
Only type "text" is currently supported which requires a text property.
|
Hey @nico-martin, it's good to be here! I saw you were working on v4 so I had to check it out! Not sure I understand, it looks to me like I could narrow the type to: { type: "text", text: string } | { type: "image" }if that works better? |
|
Oh wait. You're right. I was only looking at the TextGeneration pipeline. There it makes sense to only allow text. But if you create your own "pipeline" an vision language model, then you could also pass an image in the content array. However, I think we should distinguish between the different types of content so that it is clear, for example, that text generation models can only process text input, whereas VLMs can also process images. Would you like to pursue this further, or is it okay with you if I dive in? |
|
I have had a look through and I'm afraid I'm a bit lost with how the processors work and how that applies to the model I'm trying to use. (The app I'm playing around with uses granite-docling-258m-onnx and I'm not sure, for example, how it gets its settings. I did find some interesting things though:
I'm happy to help, but maybe don't have as much time, and definitely not as much experience, with the codebase, so please take it from here if you want! |
Message is defined as an object with a role and content, both of which are strings. But there are examples throughout the codebase, when using a vision model, showing that you can pass an array of objects with a type property of "string" or "image" and a text property that is a string.
This expands the JSDoc typedef of Message to cover both cases.