Open
Description
openedon Apr 21, 2021
This issue has been migrated from the CC Search API repository
Author: kgodey
Date: Sat Apr 27 2019
Labels: β¨ goal: improvement,π· status: label work required,π
status: discontinued
The similarity algorithm used to search titles was switched from BM25 to boolean in cc-archive/cccatalog-api#281 to avoid ranking repeated words in titles higher.
We should investigate switching back to BM25 and set the k1
tuning value to a low value just for the title field.
See cc-archive/cccatalog-api#281 (review) and BM25 algorithm docs for more info.
Original Comments:
annatuma commented on Thu Jan 23 2020:
@aldenstpage I'm putting this in Q2 of the backlog, given that there are other search algorithm improvements scheduled for then. Please evaluate if this is a fit for community contributions and if so, label it accordingly.
source
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Metadata
Assignees
Labels
Type
Projects
Status
β Blocked