Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Improve Hybrid query latency #704

Closed
martin-gaievski opened this issue Apr 23, 2024 · 0 comments
Closed

[META] Improve Hybrid query latency #704

martin-gaievski opened this issue Apr 23, 2024 · 0 comments

Comments

@martin-gaievski
Copy link
Member

martin-gaievski commented Apr 23, 2024

Hybrid query has high latency comparing to other compound queries like Boolean query. Based on results collected for 2.13 and depending on the dataset and exact query it may be up to 10 times slower than Bool. Another reason for this issue is degradation in performance of hybrid query comparing to initial release e.g. in OpenSearch 2.11.

Following are goals for this work:

  • bring performance of hybrid query to a level when it's comparable with bool query:
  • For small datasets and sub-sets it should much Bool with deviation within 20% for p90
  • For large datasets (10M+ documents) and if a sub-queries return large sub-set of documents (1M+ documents in sub-query result) hybrid query should perform no worse than 2x of Bool query
  • Multiple sub-queries can add additional overhead of no more than 20% of overall query time for p90
  • reach the level of performance of hybrid query released in 2.11

There were some GH issues in the past that are related to the same problem, e.g. #281. In addition to that, based on analysis of the source code and some profiling I can think of following list of items:

  • don't execute TopDocsCollector core collector as it takes compute and results are ignored
  • optimize plugin code for better performance: check for sub-optimal initializations, loops, type conversions etc.
  • for cases when some of sub-queries are rewritten to the same lucene form - execute only one query and copy scores

Github issues for each child item:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants