Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Query Categorization (Phase 1) #10250

Closed
deshsidd opened this issue Sep 27, 2023 · 4 comments · Fixed by #10724
Closed

Search Query Categorization (Phase 1) #10250

deshsidd opened this issue Sep 27, 2023 · 4 comments · Fixed by #10724
Labels
enhancement Enhancement or improvement to existing feature or request v2.12.0 Issues and PRs related to version 2.12.0

Comments

@deshsidd
Copy link
Contributor

deshsidd commented Sep 27, 2023

Is your feature request related to a problem? Please describe.
Today OpenSearch customers have limited very visibility into the query workload running on a cluster. There is also no easy way to identify patterns in the queries being executed upon an index. This imposes a huge gap while debugging performance issues, tracking changes in data access patterns, or when targeting new feature improvements.

The Query Classification feature in OpenSearch aims to enhance the platform's capabilities by providing a mechanism to identify patterns, latencies and resource utilization breakdown for the queries being executed upon an index. This will empower users and administrators to optimize query performance and identify query types for better resource allocation and index management.

The primary objective of this proposal is to implement a query classification mechanism within OpenSearch that can categorize and analyze the queries being executed on an index.

Describe the solution you'd like
Instrument the Query Builder logic in OpenSearch to recognize and categorize queries based on their patterns, such as search queries, aggregation queries, filtering queries, etc. This will provide insights into the types and frequencies of queries being executed on the index.

Use metric counters to record this information using Metric Framework : #10241

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Example query classification information:
total_queries
total_nested_queries
total_aggs
total_nested_aggs
match
multi_match
bool
nested_bool
wildcard
regexp
match_phrase_prefix
query_string
term
range
function_score

@deshsidd deshsidd added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 27, 2023
@dblock
Copy link
Member

dblock commented Sep 28, 2023

How useful is such a classification? How will it be used?

I do think it would be really useful to separate query shape from query data on the other hand and classify queries by their shapes, then be able to roll up the most expensive queries, but it sounds like that may not be covered by this proposal.

@msfroh
Copy link
Collaborator

msfroh commented Sep 29, 2023

I do think it would be really useful to separate query shape from query data on the other hand and classify queries by their shapes, then be able to roll up the most expensive queries, but it sounds like that may not be covered by this proposal.

I agree with @dblock -- I would like to be able to capture the full shape of all queries (along with the shape of any aggregations).

Using the new QueryBuilderVisitor API added by @vibrantvarun, I think we can capture query shape pretty easily. I cobbled together a quick and dirty visitor that outputs JSON: https://gist.github.com/msfroh/74aa3fee52f4074c5e7b8d85f76e88ab

@hdhalter
Copy link

@msfroh - Will this require documentation for 2.12?

@deshsidd deshsidd added the Meta Meta issue, not directly linked to a PR label Dec 12, 2023
@deshsidd deshsidd changed the title Search Query Classification [Meta] Search Query Categorization Dec 12, 2023
@deshsidd deshsidd changed the title [Meta] Search Query Categorization Search Query Categorization (Phase 1) Dec 12, 2023
@deshsidd deshsidd removed the Meta Meta issue, not directly linked to a PR label Dec 12, 2023
@msfroh
Copy link
Collaborator

msfroh commented Dec 13, 2023

@hdhalter -- So far, we don't have anything user-visible from this issue yet.

(Right, @deshsidd ? Will we have any user-visible query classification functions available in 2.12?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request v2.12.0 Issues and PRs related to version 2.12.0
Projects
Status: 2.12.0 (Launched)
Status: Done
6 participants