You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been working on a feature that converts entire PDF pages or specific portions of pages into images. This would enable the use of these images as input for multi-modal LLMs.
My question is: Do we actually need the multi-modal LLM to provide descriptions/explanations based on these images? I've almost finished building the image conversion part, so if the descriptive capability from the LLM is desired, I can put together a PR. Let me know your thoughts.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi team,
I've been working on a feature that converts entire PDF pages or specific portions of pages into images. This would enable the use of these images as input for multi-modal LLMs.
My question is: Do we actually need the multi-modal LLM to provide descriptions/explanations based on these images? I've almost finished building the image conversion part, so if the descriptive capability from the LLM is desired, I can put together a PR. Let me know your thoughts.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions