-
Couldn't load subscription status.
- Fork 779
Image reader and image support in gather_evidence
#1046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces multimodal capabilities to PaperQA by adding support for images as a new media type. The key changes enable the system to parse, store, and utilize images alongside text content in the question-answering process.
- Adds image parsing functionality with a new
ParsedMediaclass for storing image data and metadata - Extends the
Textclass to include associated media and updates the evidence gathering process to incorporate images - Integrates image support into the LLM prompting system for multimodal question answering
Reviewed Changes
Copilot reviewed 8 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/paperqa/types.py |
Introduces ParsedMedia class for image storage and extends Text with media field |
src/paperqa/readers.py |
Adds parse_image function and updates chunking logic to handle images |
src/paperqa/core.py |
Modifies evidence summarization to include images in LLM prompts |
src/paperqa/prompts.py |
Updates prompt templates to support multimodal content with image integration |
src/paperqa/utils.py |
Adds utility functions for base64 encoding/decoding of image data |
src/paperqa/docs.py |
Updates document addition logic to handle image parsing metadata |
src/paperqa/settings.py |
Excludes image files from indexing until embedding support is added |
tests/test_paperqa.py |
Comprehensive tests for image parsing, storage, and multimodal querying |
Comments suppressed due to low confidence (1)
tests/test_paperqa.py:1338
- This test is comparing an object with itself, which will always be true. Consider testing equality with a separate instance that has the same content to properly test the
__eq__method.
assert parsed_image == parsed_image, "Expected equality" # noqa: PLR0124
12c1d82 to
dd29e08
Compare
dd29e08 to
91689b6
Compare
…ise term 'garbled', per PR comments
…ver for flawed images, with a test
…onfirm concurrent image load capability
e17c645 to
77c866b
Compare
77c866b to
2f97d51
Compare
This PR is the first step in PaperQA becoming multimodal:
Docsto also store images (with text and metadata) as aParsedMediagather_evidencetool to include images in theContext-generation prompt