Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move back to BM25 similarity #36431

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

javanna
Copy link
Member

@javanna javanna commented Dec 10, 2018

With the last lucene upgrade we have temporarily adopted the LegacyBM25Similarity which exposes the same scores as BM25Similarity before the k1+1 factor was removed from the numerator of the scoring formula.

This PR changes the default Elasticsearch similarity back to the Bm25Similarity and updates the scores that have changed due to such change in our docs and tests.

With the recent lucene upgrade we have temporarily adopted the LegacyBM25Similarity which exposes the same scores as BM25Similarity before the k1+1 factor was removed from the numerator of the scoring formula.

This commit moves the default Elasticsearch similarity back to the Bm25Similarity and updates the scores that have changed in our docs and tests
@javanna javanna added >breaking :Search/Search Search-related issues that do not fall into other categories v7.0.0 labels Dec 10, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question around the SQL tests, looks good otherwise.

@@ -326,7 +326,7 @@ public void testExplain() throws Exception {
assertThat(searchResponse.getHits().getTotalHits().value, equalTo(1L));
Explanation explanation = searchResponse.getHits().getHits()[0].getExplanation();
assertThat(explanation.getValue(), equalTo(searchResponse.getHits().getHits()[0].getScore()));
assertThat(explanation.toString(), startsWith("0.36464313 = Score based on 2 child docs in range from 0 to 1"));
assertThat(explanation.toString(), startsWith(explanation.getValue() + " = Score based on 2 child docs in range from 0 to 1"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

2.288635 |Frank Herbert |Dune |604 |1965-06-01T00:00:00Z
1.8893257 |Frank Herbert |Dune Messiah |331 |1969-10-15T00:00:00Z
1.0402887 |Frank Herbert |Dune |604 |1965-06-01T00:00:00Z
0.8587844 |Frank Herbert |Dune Messiah |331 |1969-10-15T00:00:00Z
1.6086555 |Frank Herbert |Children of Dune |408 |1976-04-21T00:00:00Z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right? All the scores should change, and currently things aren't in score order

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will double check if I fixed all the issues in tests, and if so I will check why this is the case here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is really weird. Two documents have updated score, while the other two in this test (and other tests returning the score) seem to have the same score as before, which with the updated bm25 similarity should not be the case. Tests are green which is what concerns me the most. @costin @nik9000 would you know why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add that I have just indexed the same documents used for these tests and verified manually that all of the 4 documents matching this query get an updated score with the bm25 change. Seems like something weird happens with these tests, not sure what exactly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mixed cluster with nodes in 6x and 7 ?

Copy link
Member Author

@javanna javanna Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly, yet I would expect these tests to run both in multi-node and single-node. which makes me expect that I should not get a green build with the current changes. Single-node should require to update all the scores. I may not be getting how these tests are run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna indeed, there is something wrong here. The tests pass and, looking at the logs it produces, it seems like it's not comparing the output from the csv-spec with the actual output of the query being executed... am looking into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna the issue is here. An assertion with 1.0 delta which, in the case of SCORE(), is quite relevant and a deal breaker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see @astefan thanks for looking! Would you have the chance to fix this upstream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll open a PR.

@javanna javanna requested a review from jpountz December 14, 2018 10:23
@javanna
Copy link
Member Author

javanna commented Dec 14, 2018

retest this please

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @javanna!

@javanna javanna added the WIP label Jan 4, 2019
This allows users to opt-out of the updated bm25 similarities and use
the deprecated legacy bm25 similarity.

This helps especially cross-cluster search cases across clusters on
multiple versions, otherwise sorting by score would lead to very weird
results depending on the version of the data node that scores each doc.
@javanna
Copy link
Member Author

javanna commented Jan 8, 2019

retest this please

@javanna
Copy link
Member Author

javanna commented Jan 8, 2019

run gradle build tests 1

@mark-vieira mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022
@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:14
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@csoulios csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@rjernst rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023
@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
@quux00 quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023
@mattc58 mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023
@javanna javanna removed the v8.12.0 label Dec 6, 2023
@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Foundations/Search Catch all for Search Foundations stalled Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.