-
Notifications
You must be signed in to change notification settings - Fork 152
Description
Have you considered raising the default prefix match weight when search is performed with prefix: true
? I was surprised to find that prefix
matches are weighted lower than fuzzy matches by default, but I'm sure you have a good reason (curious what it is though!). I was more surprised that prefixes are still weighted worse by default (as far as I can tell) when you specifically request a prefix
search.
For my use case, I'm searching ICD10 codes. One example of where the default behavior is not what I expected is this search:
Query: Z8744
Options: { fuzzy: 0.15, prefix: true, fields: [ 'id' ] }
Results: [
{
id: 'Z8774',
score: 6.080555298224293,
terms: [ 'z8774' ],
queryTerms: [ 'z8744' ],
match: { z8774: [Array] },
description: 'Personal history of congenital malform of heart and circ sys',
clinical_group: 'NA',
comorbidity_group: null
},
{
id: 'Z8742',
score: 6.080555298224293,
terms: [ 'z8742' ],
queryTerms: [ 'z8744' ],
match: { z8742: [Array] },
description: 'Personal history of oth diseases of the female genital tract',
},
{
id: 'Z8544',
score: 6.080555298224293,
terms: [ 'z8544' ],
queryTerms: [ 'z8744' ],
match: { z8544: [Array] },
description: 'Personal history of malig neoplasm of female genital organs',
},
{
id: 'T8744',
score: 6.080555298224293,
terms: [ 't8744' ],
queryTerms: [ 'z8744' ],
match: { t8744: [Array] },
description: 'Infection of amputation stump, left lower extremity',
},
{
id: 'Z87440',
score: 5.791005045927898,
terms: [ 'z87440' ],
queryTerms: [ 'z8744' ],
match: { z87440: [Array] },
description: 'Personal history of urinary (tract) infections',
},
{
id: 'Z87441',
score: 5.791005045927898,
terms: [ 'z87441' ],
queryTerms: [ 'z8744' ],
match: { z87441: [Array] },
description: 'Personal history of nephrotic syndrome',
},
{
id: 'Z87442',
score: 5.791005045927898,
terms: [ 'z87442' ],
queryTerms: [ 'z8744' ],
match: { z87442: [Array] },
description: 'Personal history of urinary calculi',
},
The user is looking for codes that start with Z8744
, but the non-prefix and less relevant fuzzy matches are weighted higher and are showing up first.
This is easy enough for me to fix for my use case, so no big deal. Thanks for open sourcing this, great library!