Skip to content

Increase default prefix weight when prefix is requested #299

@sjudd

Description

@sjudd

Have you considered raising the default prefix match weight when search is performed with prefix: true? I was surprised to find that prefix matches are weighted lower than fuzzy matches by default, but I'm sure you have a good reason (curious what it is though!). I was more surprised that prefixes are still weighted worse by default (as far as I can tell) when you specifically request a prefix search.

For my use case, I'm searching ICD10 codes. One example of where the default behavior is not what I expected is this search:

Query: Z8744 
Options: { fuzzy: 0.15, prefix: true, fields: [ 'id' ] } 
Results: [
  {
    id: 'Z8774',
    score: 6.080555298224293,
    terms: [ 'z8774' ],
    queryTerms: [ 'z8744' ],
    match: { z8774: [Array] },
    description: 'Personal history of congenital malform of heart and circ sys',
    clinical_group: 'NA',
    comorbidity_group: null
  },
  {
    id: 'Z8742',
    score: 6.080555298224293,
    terms: [ 'z8742' ],
    queryTerms: [ 'z8744' ],
    match: { z8742: [Array] },
    description: 'Personal history of oth diseases of the female genital tract',
  },
  {
    id: 'Z8544',
    score: 6.080555298224293,
    terms: [ 'z8544' ],
    queryTerms: [ 'z8744' ],
    match: { z8544: [Array] },
    description: 'Personal history of malig neoplasm of female genital organs',
  },
  {
    id: 'T8744',
    score: 6.080555298224293,
    terms: [ 't8744' ],
    queryTerms: [ 'z8744' ],
    match: { t8744: [Array] },
    description: 'Infection of amputation stump, left lower extremity',
  },
  {
    id: 'Z87440',
    score: 5.791005045927898,
    terms: [ 'z87440' ],
    queryTerms: [ 'z8744' ],
    match: { z87440: [Array] },
    description: 'Personal history of urinary (tract) infections',
  },
  {
    id: 'Z87441',
    score: 5.791005045927898,
    terms: [ 'z87441' ],
    queryTerms: [ 'z8744' ],
    match: { z87441: [Array] },
    description: 'Personal history of nephrotic syndrome',
  },
  {
    id: 'Z87442',
    score: 5.791005045927898,
    terms: [ 'z87442' ],
    queryTerms: [ 'z8744' ],
    match: { z87442: [Array] },
    description: 'Personal history of urinary calculi',
  },

The user is looking for codes that start with Z8744, but the non-prefix and less relevant fuzzy matches are weighted higher and are showing up first.

This is easy enough for me to fix for my use case, so no big deal. Thanks for open sourcing this, great library!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions