Skip to content

Investigate use of the BM25 algorithm to search image titles (original #288)Β #751

Open

Description

This issue has been migrated from the CC Search API repository

Author: kgodey
Date: Sat Apr 27 2019
Labels: ✨ goal: improvement,🏷 status: label work required,πŸ™… status: discontinued

The similarity algorithm used to search titles was switched from BM25 to boolean in cc-archive/cccatalog-api#281 to avoid ranking repeated words in titles higher.

We should investigate switching back to BM25 and set the k1 tuning value to a low value just for the title field.

See cc-archive/cccatalog-api#281 (review) and BM25 algorithm docs for more info.


Original Comments:

annatuma commented on Thu Jan 23 2020:

@aldenstpage I'm putting this in Q2 of the backlog, given that there are other search algorithm improvements scheduled for then. Please evaluate if this is a fit for community contributions and if so, label it accordingly.
source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      β›” Blocked

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions