-
Couldn't load subscription status.
- Fork 779
Journal quality fixes #1034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journal quality fixes #1034
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes journal quality lookups for several common journals that were not being matched due to formatting issues and name variations. The fix includes both code changes to handle common character issues and data updates to include missing journal name variants.
- Updates journal quality lookup logic to handle ampersand encoding issues and name variations
- Adds test coverage for previously problematic journals to prevent regressions
- Updates journal quality data with missing journal name variants
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/paperqa/clients/journal_quality.py | Adds logic to clean journal names by removing HTML entities and handling ampersand variations |
| src/paperqa/clients/client_data/journal_quality.csv | Adds missing journal name variants for PNAS, Annual Review journals, BBA journals, and BMC Evolutionary Biology |
| tests/test_clients.py | Adds parametrized test to verify journal quality scores for previously problematic DOIs |
| tests/cassettes/*.yaml | Test fixture data for the new test cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing Journal Quality Data Causes Test Failures
The entry "scientific reports,0" was removed from src/paperqa/clients/client_data/journal_quality.csv without a replacement. This conflicts with test_tricky_journal_quality_results, which expects 'Scientific Reports' (e.g., DOI 10.1038/s41598-018-27044-6) to have a quality score of 1. Consequently, the journal quality lookup returns UNDEFINED_JOURNAL_QUALITY instead of the expected 1, causing test failures or incorrect scoring.
src/paperqa/clients/client_data/journal_quality.csv#L31750-L31752
Comment bugbot run to trigger another review on this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work
…states of america,3
…re-House/paper-qa into minor-journal-quality-fixes
We had some common journals that didn't show up either because of strange characters or non canonical name usage. This fixes and adds test for these outliers.
Note a raw data update could be done but it looks to be almost identical -- none of these issues are fixed with a data refresh.