Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid negative scores returned from multi_match query with cross_fields
Under specific circumstances, when using `cross_fields` scoring on a `multi_match` query, we can end up with negative scores from the inverse document frequency calculation in the BM25 formula. Specifically, the IDF is calculated as: ``` log(1 + (N - n + 0.5) / (n + 0.5)) ``` where `N` is the number of documents containing the field and `n` is the number of documents containing the given term in the field. Obviously, `n` should always be less than or equal to `N`. Unfortunately, `cross_fields` makes up a new value for `n` and tries to use it across all fields. This change finds the minimum (nonzero) value of `N` and uses that as an upper bound for the new value of `n`. Signed-off-by: Michael Froh <froh@amazon.com>
- Loading branch information