PDF Page to Image Conversion for Multimodal LLM Input - Necessary? #334

buddypia · 2025-02-15T06:51:02Z

buddypia
Feb 15, 2025

Hi team,

I've been working on a feature that converts entire PDF pages or specific portions of pages into images. This would enable the use of these images as input for multi-modal LLMs.

My question is: Do we actually need the multi-modal LLM to provide descriptions/explanations based on these images? I've almost finished building the image conversion part, so if the descriptive capability from the LLM is desired, I can put together a PR. Let me know your thoughts.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF Page to Image Conversion for Multimodal LLM Input - Necessary? #334

{{title}}

Replies: 0 comments

Select a reply

PDF Page to Image Conversion for Multimodal LLM Input - Necessary? #334

buddypia Feb 15, 2025

Replies: 0 comments

buddypia
Feb 15, 2025