-
Notifications
You must be signed in to change notification settings - Fork 122
Conversation
…sabled for the title field specifically in the search index
…e interface. It doesn't provide access to low-level Elasticsearch features that we need, like boolean similarity. The interface also changes completely every time there's a new version of Elasticsearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the Elasticsearch docs a bit and these all look like great changes.
Re: setting the similarity
of the title
field to boolean
, is it possible to use BM25 and set the k1
tuning value to a low value just for the title
field? That seems like it would keep the value of using BM25 while avoiding the saturation problem. I don't think we'd want to apply the same low value to the image's description (but we don't seem to be storing a description so that might be moot).
Tuning BM25 instead of switching to boolean similarity might provide better results, but that will take some time to test (since we have to reindex every time we adjust |
I made #288 to track the issue. |
constant_score
query filter. This was used to prevent repetitive titles from being disproportionately highly ranked by Elasticsearch's BM25 algorithm (e.g. an image titled "Nature nature nature nature nature" would be at the top of the results for any "nature" query. This situation was very common across many queries.) While usingconstant_score
solved the repetition problem, it really kneecapped the quality of our search in other ways, as it disables a lot of other desirable functionality used to rank search queries.title
field mappingsimilarity = boolean
. This disables full-text search ranking for this field specifically and leaves the rest of the fields untouched. That way, the repetition problem is solved, but we can still properly rank results. Read here for more details.elasticsearch-dsl
to create it.elasticsearch-dsl
is nice for querying documents, but I found few options for customizing the document mapping. It doesn't seem to be possible to set thesimilarity
field using their document model.