You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The article matching has been iterated on many times for different edge cases: #1074#1124#848 and there are services aimed at resolving this information e.g. #1295. From my observations, this works pretty well, but there are cases where no article is matched, due to ambiguity.
Currently, an author's input title must be an exact subset of the record retrieved from either PubMed or CrossRef after 'sanitization':
- trimming: const trimmed = _.trim( raw , ' .')
- lower casing: const lower = _.toLower( trimmed )
- removal of non-words: const clean = lower.replace(/[\W_]+/g, ' ')
Problems observed
There remain cases where we might want to reasonably relax conditions. For example:
Stop words
Input: "Syntaxin-6 delays prion protein fibril formation and prolongs the presence of toxic aggregation intermediates"
Actual: "Syntaxin-6 delays prion protein fibril formation and prolongs presence of toxic aggregation intermediates"
Input: "Senescent cells inhibit mouse myoblast differentiation via the Senescence Associated Secretory Phenotype ( SASP)-lipid 15d-PGJ2 -mediated modification and control of HRas"
Actual (PubMed): "Senescent cells inhibit mouse myoblast differentiation via the SASP-lipid 15d-PGJ2 mediated modification and control of HRas."
Input: "Root-specific theanine metabolism and regulation at the single-cell level in tea plants (Camellia sinensis)"
Actual: "Root-specific secondary metabolism at the single-cell level: a case study of theanine metabolism and regulation in the roots of tea plants (Camellia sinensis)"
Input: "
Neurons enhance blood-–brain barrier function via upregulating claudin-5 and VE-cadherin expression due to glial cell line-derived neurotrophic factor secretion"
Actual: "Neurons enhance blood-brain barrier function via upregulating claudin-5 and VE-cadherin expression due to GDNF secretion"
There are potential pitfalls to increasing flexibility, notably, the title of a manuscript can change between preprints, versions and the final version of record.
Tasks
Collect additional cases of real/potential mismatches
Create test harness
Pull out common code for matching
The text was updated successfully, but these errors were encountered:
Background
The article matching has been iterated on many times for different edge cases: #1074 #1124 #848 and there are services aimed at resolving this information e.g. #1295. From my observations, this works pretty well, but there are cases where no article is matched, due to ambiguity.
Currently, an author's input title must be an exact subset of the record retrieved from either PubMed or CrossRef after 'sanitization':
- trimming:
const trimmed = _.trim( raw , ' .')
- lower casing:
const lower = _.toLower( trimmed )
- removal of non-words:
const clean = lower.replace(/[\W_]+/g, ' ')
Problems observed
There remain cases where we might want to reasonably relax conditions. For example:
eLife 2024: Defining cell type-specific immune responses in a mouse model of allergic contact dermatitis by single-cell transcriptomics"
Neurons enhance blood-–brain barrier function via upregulating claudin-5 and VE-cadherin expression due to glial cell line-derived neurotrophic factor secretion"
Details
There are potential pitfalls to increasing flexibility, notably, the title of a manuscript can change between preprints, versions and the final version of record.
Tasks
The text was updated successfully, but these errors were encountered: