Filling and updating Document Article metadata #1211

jvwong · 2023-10-11T16:51:31Z

Modifying the function that initially fills and updates article metadata from source to include:

Preprints
- via CrossRef Unified Resource API
  - Publishers supported
    - 'Cold Spring Harbor Laboratory'
      - bioRxiv and medRxiv
    - 'eLife Sciences Publications, Ltd'
    - 'Research Square Platform LLC' (Nature)
Update
- Uses an existing PubMed ID, DOI or the author supplied title as a default to refresh

Table Test cases evaluated (as of Oct 11, 2023)

Update	Case	PMID	DOI	Identifier	Data Source	Reference	Pass
N	Unrecognized/spam/network error	N	N	paperId: "asdasdasd"	Default	-	Y
N	Published (No preprint)	N	N	paperId: "SENP1-Sirt3 Signaling Controls Mitochondrial Protein Acetylation and Metabolism"	PubMed	https://pubmed.ncbi.nlm.nih.gov/31302001/	Y
N	Preprint (No publication)	N	N	paperId: "Epigenetic Reprogramming of Tissue-Specific Transcription Promotes Metastasis"	CrossRef	https://www.biorxiv.org/content/10.1101/131102v1	Y
N	Published after preprint	N	N	paperId: "Laminin α1 orchestrates VEGFA functions in the ecosystem of colorectal carcinoma"	PubMed	https://pubmed.ncbi.nlm.nih.gov/29907957/	Y
N	Preprint (eLife) after preprint (bioRxiv + PubMed)	N	N	paperId: "Tmem263 deletion disrupts the GH/IGF-1 axis and causes dwarfism and impairs skeletal acquisition"	CrossRef	https://elifesciences.org/reviewed-preprints/90949	Y
N	Preprint (eLife) before preprint (Research Sq.)	N	N	paperId: "RNA-binding deficient TDP-43 drives cognitive decline in a mouse model of TDP-43 proteinopathy"	CrossRef	https://elifesciences.org/reviewed-preprints/85921	Works by accident (top 5)
Y	Unrecognized/spam/network error	N	N	paperId: "asdasdasd"	Default	-	Y
Y	Published (No preprint)	Y	Y	pmid: "31302001"	PubMed	https://pubmed.ncbi.nlm.nih.gov/31302001/	Y
Y	Preprint (No publication)	N	Y	doi: "10.1101/131102"	CrossRef	https://www.biorxiv.org/content/10.1101/131102v1	Y
Y	Published after preprint	Y	Y	pmid: "29907957"	PubMed	https://pubmed.ncbi.nlm.nih.gov/29907957/	Y
Y	Preprint (eLife) after preprint (bioRxiv + PubMed)	N	Y	doi: "10.7554/elife.90949"	CrossRef	https://elifesciences.org/reviewed-preprints/90949	Y
Y	Preprint (eLife) before preprint (Research Sq.)	N	Y	doi: "10.7554/elife.85921.2"	CrossRef	https://elifesciences.org/reviewed-preprints/85921	Works by accident (top 5)

Refs #1201 #961

maxkfranz

Looks good.

One edge case to consider: The cases you outlined (e.g. PubMed failed, CrossRef succeeded) may come about for unintended conditions. For instance, PubMed may fail not because it doesn’t have anything but because it just errored out.

It may not make that much difference in practice on the whole, but it’s worth consideration for future. It could cause hard to diagnose bugs w.r.t. the paper/doc association.

Does the existing code differentiate between types of errors currently?

jvwong · 2023-10-18T18:34:06Z

One edge case to consider: The cases you outlined (e.g. PubMed failed, CrossRef succeeded) may come about for unintended conditions. For instance, PubMed may fail not because it doesn’t have anything but because it just errored out.

It may not make that much difference in practice on the whole, but it’s worth consideration for future. It could cause hard to diagnose bugs w.r.t. the paper/doc association.

Does the existing code differentiate between types of errors currently?

Good point - I think we already have capability to differentiate between HTTP code (timeout, 500 etc) and empty results (HTTP OK), but just never used them. I'll take a look.

# Conflicts: # src/server/routes/api/document/crossref/works.js

jvwong · 2023-10-19T20:53:52Z

The logic for fillDocArticle as of eef4b32

Still need to think about about what to do in each case.

jvwong · 2023-10-20T17:16:39Z

Update now considers whether an article record is found from a source (PubMed, CrossRef) and whether it was not found (but the HTTP response status was OK).

If ANY HTTP status error occurs, all bets are off and we tumble down into the default. This is the same with the case where neither source finds an article (as before).

maxkfranz · 2023-10-23T21:58:23Z

Good idea. That sounds sensible. Let's merge

jvwong added 4 commits October 10, 2023 14:51

Create a findPreprint function to search crossref.

a64d858

First pass - find article in PubMed or CrossRef

d025db2

prototype for fillDoc article.

c84ee43

Retrieve latest when pubmed and crossref return items.

f0e4109

jvwong requested a review from maxkfranz October 12, 2023 14:22

maxkfranz reviewed Oct 18, 2023

View reviewed changes

jvwong added 2 commits October 19, 2023 15:51

Merge branch 'unstable' into iss1201_fill-article-by-id

c97479d

# Conflicts: # src/server/routes/api/document/crossref/works.js

Throw an articleID error or original Http status error.

eef4b32

Differentiate between HTTP errors and failure to find article Error.

1b2f402

jvwong requested a review from maxkfranz October 20, 2023 17:16

jvwong merged commit 8b6602a into unstable Oct 25, 2023

jvwong deleted the iss1201_fill-article-by-id branch October 26, 2023 17:16

jvwong mentioned this pull request Jul 3, 2024

Improve process of identifying and updating article using the author-provided information (e.g. title) #1281

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Filling and updating Document Article metadata #1211

Filling and updating Document Article metadata #1211

Uh oh!

jvwong commented Oct 11, 2023 •

edited

Loading

Uh oh!

maxkfranz left a comment •

edited

Loading

Uh oh!

jvwong commented Oct 18, 2023

Uh oh!

jvwong commented Oct 19, 2023 •

edited

Loading

Uh oh!

jvwong commented Oct 20, 2023

Uh oh!

maxkfranz commented Oct 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Filling and updating Document Article metadata #1211

Filling and updating Document Article metadata #1211

Uh oh!

Conversation

jvwong commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxkfranz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jvwong commented Oct 18, 2023

Uh oh!

jvwong commented Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvwong commented Oct 20, 2023

Uh oh!

maxkfranz commented Oct 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jvwong commented Oct 11, 2023 •

edited

Loading

maxkfranz left a comment •

edited

Loading

jvwong commented Oct 19, 2023 •

edited

Loading