Skip to content

Conversation

@mskarlin
Copy link
Collaborator

We had some common journals that didn't show up either because of strange characters or non canonical name usage. This fixes and adds test for these outliers.

Note a raw data update could be done but it looks to be almost identical -- none of these issues are fixed with a data refresh.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 25, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes journal quality lookups for several common journals that were not being matched due to formatting issues and name variations. The fix includes both code changes to handle common character issues and data updates to include missing journal name variants.

  • Updates journal quality lookup logic to handle ampersand encoding issues and name variations
  • Adds test coverage for previously problematic journals to prevent regressions
  • Updates journal quality data with missing journal name variants

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

File Description
src/paperqa/clients/journal_quality.py Adds logic to clean journal names by removing HTML entities and handling ampersand variations
src/paperqa/clients/client_data/journal_quality.csv Adds missing journal name variants for PNAS, Annual Review journals, BBA journals, and BMC Evolutionary Biology
tests/test_clients.py Adds parametrized test to verify journal quality scores for previously problematic DOIs
tests/cassettes/*.yaml Test fixture data for the new test cases

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing Journal Quality Data Causes Test Failures

The entry "scientific reports,0" was removed from src/paperqa/clients/client_data/journal_quality.csv without a replacement. This conflicts with test_tricky_journal_quality_results, which expects 'Scientific Reports' (e.g., DOI 10.1038/s41598-018-27044-6) to have a quality score of 1. Consequently, the journal quality lookup returns UNDEFINED_JOURNAL_QUALITY instead of the expected 1, causing test failures or incorrect scoring.

src/paperqa/clients/client_data/journal_quality.csv#L31750-L31752

https://github.com/Future-House/paper-qa/blob/c3384fabc26316446a5315a92577e8ff8dc059fa/src/paperqa/clients/client_data/journal_quality.csv#L31750-L31752

Fix in CursorFix in Web


Comment bugbot run to trigger another review on this PR

Copy link
Collaborator

@jamesbraza jamesbraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 25, 2025
@mskarlin mskarlin merged commit feb3c43 into main Jul 25, 2025
5 checks passed
@mskarlin mskarlin deleted the minor-journal-quality-fixes branch July 25, 2025 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants