Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion src/paperqa/clients/client_data/journal_quality.csv
Original file line number Diff line number Diff line change
Expand Up @@ -5143,6 +5143,8 @@ annual review of neuroscience,2
annual review of nuclear and particle science,1
annual review of nutrition,3
annual review of pathology-mechanisms of disease,2
annual review of pathology: mechanisms of disease,2
annual review of pathology,2
annual review of pharmacology and toxicology,3
annual review of physical chemistry,1
annual review of physiology,3
Expand Down Expand Up @@ -6057,6 +6059,15 @@ biochimica et biophysica acta: molecular basis of disease,1
biochimica et biophysica acta: molecular cell research,1
biochimica et biophysica acta: proteins and proteomics,1
biochimica et biophysica acta: reviews on cancer,1
biochimica et biophysica acta (bba) - bioenergetics,1
biochimica et biophysica acta (bba) - biomembranes,1
biochimica et biophysica acta (bba) - gene regulatory mechanisms,1
biochimica et biophysica acta (bba) - general subjects,1
biochimica et biophysica acta (bba) - molecular and cell biology of lipids,1
biochimica et biophysica acta (bba) - molecular basis of disease,1
biochimica et biophysica acta (bba) - molecular cell research,1
biochimica et biophysica acta (bba) - proteins and proteomics,1
biochimica et biophysica acta (bba) - reviews on cancer,1
biochimie,1
biochip journal,1
bioconjugate chemistry,1
Expand Down Expand Up @@ -17343,6 +17354,8 @@ proceedings of the linnean society of new south wales,1
proceedings of the london mathematical society,3
proceedings of the national academy of sciences india section b: biologicalsciences,1
proceedings of the national academy of sciences of the united states of america,3
proceedings of the national academy of sciences,3
pnas,3
proceedings of the nutrition society,1
proceedings of the prehistoric society,2
proceedings of the risø international symposium on materials science,1
Expand Down Expand Up @@ -31738,7 +31751,6 @@ proceedings of international conference on the advancement of steam,0
selected papers of internet research,0
bat research news,1
imerides endymasiologias - praktika,1
scientific reports,0
ecaade proceedings,0
traficomin tutkimuksia ja selvityksiä,0
esignals research,0
Expand Down Expand Up @@ -32339,6 +32351,7 @@ radical philosophy review,1
psychology of popular media,1
electronic research archive,1
bmc ecology and evolution,2
bmc evolutionary biology,2
annales fennici mathematici,2
minerva surgery,1
forces in mechanics,0
Expand Down
19 changes: 14 additions & 5 deletions src/paperqa/clients/journal_quality.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import csv
import logging
import os
from typing import Any
from typing import Any, ClassVar

from pydantic import ValidationError

Expand All @@ -18,6 +18,10 @@


class JournalQualityPostProcessor(MetadataPostProcessor[JournalQuery]):

# these will be deleted from any journal names before querying
CASEFOLD_PHRASES_TO_REMOVE: ClassVar[list[str]] = ["amp;"]

def __init__(self, journal_quality_path: os.PathLike | str | None = None) -> None:
if journal_quality_path is None:
# Construct the path relative to module
Expand All @@ -41,17 +45,22 @@ async def _process(
) -> DocDetails:
if not self.data:
self.load_data()

# TODO: not super scalable, but unless we need more than this we can just grugbrain
journal_query = query.journal.casefold()
for phrase in self.CASEFOLD_PHRASES_TO_REMOVE:
journal_query = journal_query.replace(phrase, "")

# docname can be blank since the validation will add it
# remember, if both have docnames (i.e. key) they are
# wiped and re-generated with resultant data
return doc_details + DocDetails(
doc_id=doc_details.doc_id, # ensure doc_id is preserved
dockey=doc_details.dockey, # ensure dockey is preserved
source_quality=max(
[
self.data.get(query.journal.casefold(), DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
self.data.get("the " + query.journal.casefold(), DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
]
self.data.get(journal_query, DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
self.data.get("the " + journal_query, DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
self.data.get(journal_query.replace("&", "and"), DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
),
)

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading