Skip to content

Conversation

@jvwong
Copy link
Member

@jvwong jvwong commented Oct 11, 2023

Modifying the function that initially fills and updates article metadata from source to include:

  • Preprints
    • via CrossRef Unified Resource API
      • Publishers supported
        • 'Cold Spring Harbor Laboratory'
          • bioRxiv and medRxiv
        • 'eLife Sciences Publications, Ltd'
        • 'Research Square Platform LLC' (Nature)
  • Update
    • Uses an existing PubMed ID, DOI or the author supplied title as a default to refresh

Table Test cases evaluated (as of Oct 11, 2023)

Update Case PMID DOI Identifier Data Source Reference Pass
N Unrecognized/spam/network error N N paperId: "asdasdasd" Default - Y
N Published (No preprint) N N paperId: "SENP1-Sirt3 Signaling Controls Mitochondrial Protein Acetylation and Metabolism" PubMed https://pubmed.ncbi.nlm.nih.gov/31302001/ Y
N Preprint (No publication) N N paperId: "Epigenetic Reprogramming of Tissue-Specific Transcription Promotes Metastasis" CrossRef https://www.biorxiv.org/content/10.1101/131102v1 Y
N Published after preprint N N paperId: "Laminin α1 orchestrates VEGFA functions in the ecosystem of colorectal carcinoma" PubMed https://pubmed.ncbi.nlm.nih.gov/29907957/ Y
N Preprint (eLife) after preprint (bioRxiv + PubMed) N N paperId: "Tmem263 deletion disrupts the GH/IGF-1 axis and causes dwarfism and impairs skeletal acquisition" CrossRef https://elifesciences.org/reviewed-preprints/90949 Y
N Preprint (eLife) before preprint (Research Sq.) N N paperId: "RNA-binding deficient TDP-43 drives cognitive decline in a mouse model of TDP-43 proteinopathy" CrossRef https://elifesciences.org/reviewed-preprints/85921 Works by accident (top 5)
Y Unrecognized/spam/network error N N paperId: "asdasdasd" Default - Y
Y Published (No preprint) Y Y pmid: "31302001" PubMed https://pubmed.ncbi.nlm.nih.gov/31302001/ Y
Y Preprint (No publication) N Y doi: "10.1101/131102" CrossRef https://www.biorxiv.org/content/10.1101/131102v1 Y
Y Published after preprint Y Y pmid: "29907957" PubMed https://pubmed.ncbi.nlm.nih.gov/29907957/ Y
Y Preprint (eLife) after preprint (bioRxiv + PubMed) N Y doi: "10.7554/elife.90949" CrossRef https://elifesciences.org/reviewed-preprints/90949 Y
Y Preprint (eLife) before preprint (Research Sq.) N Y doi: "10.7554/elife.85921.2" CrossRef https://elifesciences.org/reviewed-preprints/85921 Works by accident (top 5)

Refs #1201 #961

@jvwong jvwong requested a review from maxkfranz October 12, 2023 14:22
Copy link
Member

@maxkfranz maxkfranz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

One edge case to consider: The cases you outlined (e.g. PubMed failed, CrossRef succeeded) may come about for unintended conditions. For instance, PubMed may fail not because it doesn’t have anything but because it just errored out.

It may not make that much difference in practice on the whole, but it’s worth consideration for future. It could cause hard to diagnose bugs w.r.t. the paper/doc association.

Does the existing code differentiate between types of errors currently?

@jvwong
Copy link
Member Author

jvwong commented Oct 18, 2023

One edge case to consider: The cases you outlined (e.g. PubMed failed, CrossRef succeeded) may come about for unintended conditions. For instance, PubMed may fail not because it doesn’t have anything but because it just errored out.

It may not make that much difference in practice on the whole, but it’s worth consideration for future. It could cause hard to diagnose bugs w.r.t. the paper/doc association.

Does the existing code differentiate between types of errors currently?

Good point - I think we already have capability to differentiate between HTTP code (timeout, 500 etc) and empty results (HTTP OK), but just never used them. I'll take a look.

@jvwong
Copy link
Member Author

jvwong commented Oct 19, 2023

The logic for fillDocArticle as of eef4b32

Still need to think about about what to do in each case.

iss961_ Factoid -- Preprint

@jvwong
Copy link
Member Author

jvwong commented Oct 20, 2023

Update now considers whether an article record is found from a source (PubMed, CrossRef) and whether it was not found (but the HTTP response status was OK).

If ANY HTTP status error occurs, all bets are off and we tumble down into the default. This is the same with the case where neither source finds an article (as before).

@jvwong jvwong requested a review from maxkfranz October 20, 2023 17:16
@maxkfranz
Copy link
Member

Good idea. That sounds sensible. Let's merge

@jvwong jvwong merged commit 8b6602a into unstable Oct 25, 2023
@jvwong jvwong deleted the iss1201_fill-article-by-id branch October 26, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants