Fallback for errors in related paper retrieval #1025

jvwong · 2021-12-08T19:03:47Z

This update stems from observations that Explorer views are showing no related papers, making it appear 'broken'.

It appears that calls to INDRA or semantic search are failing, in a manner that is intermittent. The problem is that there are no fall backs, that is, either everything works or the whole thing fails.

Here, I simply use the document level papers retrieved from PubMed (which is typically robust) to back fill the related papers in the case that any problems occur.

Refs:
#988 (comment)
#937
PathwayCommons/semantic-search#98

…ilures

…papers call to indra module.

maxkfranz · 2021-12-08T19:33:29Z

Great

Are these fallback results marked so that they can be updated properly by the cron job with real results?

jvwong · 2021-12-08T19:44:35Z

Great

Are these fallback results marked so that they can be updated properly by the cron job with real results?

So the cron will call:

factoid/src/server/routes/api/document/index.js

Lines 2318 to 2322 in 3c250f6

    
           const updateRelatedPapers = async doc => { 
        
             await getRelPprsForDoc( doc ); 
        
             await getRelatedPapersForNetwork( doc ); 
        
             return doc; 
        
           };

so this should be updated accordingly - is this what you mean?

maxkfranz · 2021-12-08T21:17:14Z

I just want to make sure that the placeholder data doesn't sit around indefinitely. The cron job updates everything unconditionally, right?

jvwong · 2021-12-08T21:20:05Z

I just want to make sure that the placeholder data doesn't sit around indefinitely. The cron job updates everything unconditionally, right?

Yes, in theory the cron should replace data if those web service calls are successful.

maxkfranz · 2021-12-08T21:30:49Z

All right. Sounds good.

Later on we may consider having the cron job leave the data as-is if it fails for a doc that has prior, valid data. It's not ideal if an existing set of data -- albeit possibly a bit old -- is overwritten by placeholder data because the update fails. Not that important now but may be nice to have eventually.

The main thing overall is that the cron job / update process is frequent with a relatively low failure rate. That minimises the likelihood that any given doc has failures at any point in time. If the errors are random or sporadic, then we could also consider increasing the rate of the cron job only for failed docs (e.g. a second, high-frequency cron job just for the failures). That would also push the error probability down

jvwong · 2021-12-08T21:34:56Z

Later on we may consider having the cron job leave the data as-is if it fails for a doc that has prior, valid data. It's not ideal if an existing set of data -- albeit possibly a bit old -- is overwritten by placeholder data because the update fails. Not that important now but may be nice to have eventually.

Actually this is a point I missed - let me push another update so that

Initial doc submit: will back fill data on service error when none exists
Cron: will leave valid data on service error if it exists.

…ice failure

jvwong · 2021-12-09T16:55:13Z

Here are the cases for the related papers:

Document-level
- Semantic search fails
  - No existing papers: set the list of raw PMIDs from PubMed ELINK, unranked
  - Existing papers: refresh the metadata for this list
- EUTILS fails
  - No existing papers: set empty list
  - Existing papers: do nothing
Network-level
- Any failure under indra.searchDocuments
  - No existing papers: set empty list, backfill with Document-level papers
  - Existing papers: do nothing

I guess this is ripe for some integration tests if this gets any more complicated.

jvwong added 2 commits December 8, 2021 11:43

Enable the document-level related papers to bypass semantic-search fa…

dbc3705

…ilures

Fallback to empty if any failures occur in the network-level related …

3c250f6

…papers call to indra module.

jvwong requested a review from maxkfranz December 8, 2021 19:04

jvwong mentioned this pull request Dec 8, 2021

Accomodate the updates to semantic search #1026

Merged

3 tasks

jvwong added 2 commits December 9, 2021 10:35

For document-level related papers, preserve existing papers upon serv…

44451b7

…ice failure

Network level papers will use existing papers upon failures

0ec3bcd

jvwong merged commit d56055b into unstable Dec 14, 2021

jvwong deleted the iss937_fallback-related-papers branch December 14, 2021 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fallback for errors in related paper retrieval #1025

Fallback for errors in related paper retrieval #1025

Uh oh!

jvwong commented Dec 8, 2021

Uh oh!

maxkfranz commented Dec 8, 2021

Uh oh!

jvwong commented Dec 8, 2021

Uh oh!

maxkfranz commented Dec 8, 2021

Uh oh!

jvwong commented Dec 8, 2021

Uh oh!

maxkfranz commented Dec 8, 2021

Uh oh!

jvwong commented Dec 8, 2021

Uh oh!

jvwong commented Dec 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fallback for errors in related paper retrieval #1025

Fallback for errors in related paper retrieval #1025

Uh oh!

Conversation

jvwong commented Dec 8, 2021

Uh oh!

maxkfranz commented Dec 8, 2021

Uh oh!

jvwong commented Dec 8, 2021

Uh oh!

maxkfranz commented Dec 8, 2021

Uh oh!

jvwong commented Dec 8, 2021

Uh oh!

maxkfranz commented Dec 8, 2021

Uh oh!

jvwong commented Dec 8, 2021

Uh oh!

jvwong commented Dec 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants