Skip to content

Add error handling for index file loading in SearchIndex #948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 26, 2025

Conversation

tobiasdiez
Copy link
Contributor

I got the following error while playing with paper-qa:

File D:\Programming\paper-qa\paperqa\agents\search.py:262, in SearchIndex.index_files(self)
    259         async with await anyio.open_file(file_index_path, "rb") as f:
    260             content = await f.read()
    261             self._index_files = pickle.loads(  # noqa: S301
--> 262                 zlib.decompress(content)
    263             )
    264 return self._index_files

error: Error -5 while decompressing data: incomplete or truncated stream

Since it was a bit hard to locate the offending file, I added logging so that one can see where the error is.

(Not sure why the error occurred in the first place, it disappeared after deleting the index files.)

@Copilot Copilot AI review requested due to automatic review settings May 7, 2025 10:30
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug Something isn't working labels May 7, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds error handling for decompression and unpickling operations when loading index files in the SearchIndex. The changes ensure that if an error occurs during the loading process, an error message is logged so that the offending file can be identified.

  • Wrapped the deserialization of index files with a try/except block.
  • Added logging to report errors during index file handling.

Comment on lines 265 to 266
except Exception:
logger.exception(f"Failed to load index file {file_index_path}")
Copy link
Preview

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching all exceptions may obscure underlying issues; consider catching specific exceptions, such as zlib.error or pickle.UnpicklingError, to handle known error scenarios more accurately.

Suggested change
except Exception:
logger.exception(f"Failed to load index file {file_index_path}")
except (pickle.UnpicklingError, zlib.error) as e:
logger.exception(f"Failed to load index file {file_index_path}: {e}")

Copilot uses AI. Check for mistakes.

zlib.decompress(content)
)
except Exception:
logger.exception(f"Failed to load index file {file_index_path}")
Copy link
Preview

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After logging the exception when loading the index file fails, _index_files is not reset, potentially returning stale data. Consider assigning a safe fallback value (e.g., an empty dict) to _index_files in the exception block.

Suggested change
logger.exception(f"Failed to load index file {file_index_path}")
logger.exception(f"Failed to load index file {file_index_path}")
self._index_files = {}

Copilot uses AI. Check for mistakes.

tobiasdiez and others added 2 commits May 9, 2025 21:01
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 21, 2025
@jamesbraza
Copy link
Collaborator

Thank you for this!

@jamesbraza jamesbraza merged commit 0e493ba into Future-House:main May 26, 2025
3 of 5 checks passed
@tobiasdiez tobiasdiez deleted the index-file-excep branch May 27, 2025 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants