Image reader and image support in `gather_evidence` #1046

jamesbraza · 2025-08-05T22:29:31Z

This PR is the first step in PaperQA becoming multimodal:

Adds reader support for images
Converts Docs to also store images (with text and metadata) as a ParsedMedia
Expands gather_evidence tool to include images in the Context-generation prompt
Adds tests of the prior three steps

Copilot

Pull Request Overview

This PR introduces multimodal capabilities to PaperQA by adding support for images as a new media type. The key changes enable the system to parse, store, and utilize images alongside text content in the question-answering process.

Adds image parsing functionality with a new ParsedMedia class for storing image data and metadata
Extends the Text class to include associated media and updates the evidence gathering process to incorporate images
Integrates image support into the LLM prompting system for multimodal question answering

Reviewed Changes

Copilot reviewed 8 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/paperqa/types.py`	Introduces `ParsedMedia` class for image storage and extends `Text` with media field
`src/paperqa/readers.py`	Adds `parse_image` function and updates chunking logic to handle images
`src/paperqa/core.py`	Modifies evidence summarization to include images in LLM prompts
`src/paperqa/prompts.py`	Updates prompt templates to support multimodal content with image integration
`src/paperqa/utils.py`	Adds utility functions for base64 encoding/decoding of image data
`src/paperqa/docs.py`	Updates document addition logic to handle image parsing metadata
`src/paperqa/settings.py`	Excludes image files from indexing until embedding support is added
`tests/test_paperqa.py`	Comprehensive tests for image parsing, storage, and multimodal querying

Comments suppressed due to low confidence (1)

tests/test_paperqa.py:1338

This test is comparing an object with itself, which will always be true. Consider testing equality with a separate instance that has the same content to properly test the __eq__ method.

    assert parsed_image == parsed_image, "Expected equality"  # noqa: PLR0124

src/paperqa/types.py

src/paperqa/prompts.py

src/paperqa/readers.py

…ise term 'garbled', per PR comments

src/paperqa/readers.py

tests/test_paperqa.py

…ver for flawed images, with a test

…onfirm concurrent image load capability

…ncurrency works

jamesbraza added 3 commits August 5, 2025 15:00

Resolved type ignores in test_parser_only_reader

0cabfe5

Documented a few fields in ChunkMetadata

92c7e9b

Created converters for going to/from bytes, with tests

f01f39d

jamesbraza requested review from SamCox822, maykcaldas, mskarlin, nadolskit and whitead August 5, 2025 22:29

jamesbraza self-assigned this Aug 5, 2025

jamesbraza added the enhancement New feature or request label Aug 5, 2025

Copilot AI review requested due to automatic review settings August 5, 2025 22:29

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 5, 2025

Copilot AI reviewed Aug 5, 2025

View reviewed changes

src/paperqa/types.py Outdated Show resolved Hide resolved

src/paperqa/types.py Outdated Show resolved Hide resolved

src/paperqa/types.py Show resolved Hide resolved

src/paperqa/types.py Outdated Show resolved Hide resolved

jamesbraza force-pushed the multimodal-images branch 2 times, most recently from 12c1d82 to dd29e08 Compare August 5, 2025 22:34

jamesbraza added 2 commits August 5, 2025 15:44

Introduced ParsedMedia with a reader, and tests

8a409a8

Integrated ParsedMedia into evidence gathering, with tests

91689b6

jamesbraza force-pushed the multimodal-images branch from dd29e08 to 91689b6 Compare August 5, 2025 22:44

mskarlin reviewed Aug 5, 2025

View reviewed changes

src/paperqa/prompts.py Outdated Show resolved Hide resolved

mskarlin reviewed Aug 5, 2025

View reviewed changes

src/paperqa/readers.py Outdated Show resolved Hide resolved

Adjusted text_with_tables_prompt_template to avoid technically imprec…

930300e

…ise term 'garbled', per PR comments

mskarlin reviewed Aug 5, 2025

View reviewed changes

src/paperqa/readers.py Outdated Show resolved Hide resolved

Decomposed out chunk factory for readability

96a8bb6

mskarlin reviewed Aug 5, 2025

View reviewed changes

tests/test_paperqa.py Show resolved Hide resolved

mskarlin approved these changes Aug 5, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 5, 2025

Created AnswerSettings.evidence_text_only_fallback to provide a failo…

41fb7d3

…ver for flawed images, with a test

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Aug 6, 2025

Created 'image' extra for validate_image, and used it in testing to c…

4c05ebe

…onfirm concurrent image load capability

jamesbraza force-pushed the multimodal-images branch from e17c645 to 77c866b Compare August 6, 2025 04:33

Added validator arg to parse_image, and used in testing to confirm co…

2f97d51

…ncurrency works

jamesbraza force-pushed the multimodal-images branch from 77c866b to 2f97d51 Compare August 6, 2025 04:36

jamesbraza merged commit 5675e97 into main Aug 6, 2025
7 checks passed

jamesbraza deleted the multimodal-images branch August 6, 2025 05:31

jamesbraza added a commit that referenced this pull request Aug 25, 2025

Image reader and image support in gather_evidence (#1046)

061dbc2

jamesbraza added a commit that referenced this pull request Aug 26, 2025

Image reader and image support in gather_evidence (#1046)

95ec4b4

This was referenced Oct 4, 2025

Fixing KeyError crash when PDF reader misses a page #1122

Merged

Basic multimodal embedding support Future-House/ldp#349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Image reader and image support in `gather_evidence` #1046

Image reader and image support in `gather_evidence` #1046

Uh oh!

jamesbraza commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Image reader and image support in gather_evidence #1046

Image reader and image support in gather_evidence #1046

Uh oh!

Conversation

jamesbraza commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Image reader and image support in `gather_evidence` #1046

Image reader and image support in `gather_evidence` #1046