Skip to content

Conversation

@jvwong
Copy link
Member

@jvwong jvwong commented Dec 8, 2021

This update stems from observations that Explorer views are showing no related papers, making it appear 'broken'.

It appears that calls to INDRA or semantic search are failing, in a manner that is intermittent. The problem is that there are no fall backs, that is, either everything works or the whole thing fails.

Here, I simply use the document level papers retrieved from PubMed (which is typically robust) to back fill the related papers in the case that any problems occur.

Refs:
#988 (comment)
#937
PathwayCommons/semantic-search#98

@jvwong jvwong requested a review from maxkfranz December 8, 2021 19:04
@maxkfranz
Copy link
Member

Great

Are these fallback results marked so that they can be updated properly by the cron job with real results?

@jvwong
Copy link
Member Author

jvwong commented Dec 8, 2021

Great

Are these fallback results marked so that they can be updated properly by the cron job with real results?

So the cron will call:

const updateRelatedPapers = async doc => {
await getRelPprsForDoc( doc );
await getRelatedPapersForNetwork( doc );
return doc;
};

so this should be updated accordingly - is this what you mean?

@maxkfranz
Copy link
Member

I just want to make sure that the placeholder data doesn't sit around indefinitely. The cron job updates everything unconditionally, right?

@jvwong
Copy link
Member Author

jvwong commented Dec 8, 2021

I just want to make sure that the placeholder data doesn't sit around indefinitely. The cron job updates everything unconditionally, right?

Yes, in theory the cron should replace data if those web service calls are successful.

@maxkfranz
Copy link
Member

All right. Sounds good.

Later on we may consider having the cron job leave the data as-is if it fails for a doc that has prior, valid data. It's not ideal if an existing set of data -- albeit possibly a bit old -- is overwritten by placeholder data because the update fails. Not that important now but may be nice to have eventually.

The main thing overall is that the cron job / update process is frequent with a relatively low failure rate. That minimises the likelihood that any given doc has failures at any point in time. If the errors are random or sporadic, then we could also consider increasing the rate of the cron job only for failed docs (e.g. a second, high-frequency cron job just for the failures). That would also push the error probability down

@jvwong
Copy link
Member Author

jvwong commented Dec 8, 2021

Later on we may consider having the cron job leave the data as-is if it fails for a doc that has prior, valid data. It's not ideal if an existing set of data -- albeit possibly a bit old -- is overwritten by placeholder data because the update fails. Not that important now but may be nice to have eventually.

Actually this is a point I missed - let me push another update so that

  • Initial doc submit: will back fill data on service error when none exists
  • Cron: will leave valid data on service error if it exists.

@jvwong
Copy link
Member Author

jvwong commented Dec 9, 2021

Here are the cases for the related papers:

  • Document-level
    • Semantic search fails
      • No existing papers: set the list of raw PMIDs from PubMed ELINK, unranked
      • Existing papers: refresh the metadata for this list
    • EUTILS fails
      • No existing papers: set empty list
      • Existing papers: do nothing
  • Network-level
    • Any failure under indra.searchDocuments
      • No existing papers: set empty list, backfill with Document-level papers
      • Existing papers: do nothing

I guess this is ripe for some integration tests if this gets any more complicated.

@jvwong jvwong merged commit d56055b into unstable Dec 14, 2021
@jvwong jvwong deleted the iss937_fallback-related-papers branch December 14, 2021 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants