-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Query Runtime Cost Calculation #5174
Comments
I am not really sure what is being estimated as the "search query cost" here. Based on the description it is deducted as the resource consumption stats, which is post execution of the query. The exactly same query could have drastically different consumption stats over time (fe because new data is being ingested all the time). What would be useful though is to estimate search query cost before the execution, based on:
Does it make sense or am I missing something here? |
I think the proposal is a little unclear on cost estimation vs. query planning, and so @reta is rightfully confused. I think the purpose of the proposal is to predict the best way possible the "runtime cost" (consumption of time and space) of an incoming query and use it in backpressure. Runtime cost is impacted by the query being made, but it's a lot more impacted by things like the size of data. So, I propose to explain the goals by calling the ask here as "search query runtime cost" (vs. just search query cost), and calling the non-runtime aspects of a query "query (plan) cost (or complexity)". Does that help? |
@reta We are not really estimating the query cost here, rather we are just calculating the actual query cost(or call it runtime cost). We are building co-ordinator level view of resource consumption for each search request. As discussed in #1179 and #1181, we want to build the aggregated view of resource consumption stats for any given query. For the same, we want to piggyback the consumption stats to the parent node. However, as a part of this issue, we will not make cancellation decisions yet. Also, exactly the same query can have different resource consumption stats for different scenarios but as @dblock mentioned, we are trying calculate "runtime cost" for a search query. @dblock Appreciate your suggestion to change it to "Search Query Runtime Cost". Will update the title. |
Hi @PritLadani, are you actively working on this? |
Hey @anasalkouz , I might not be able to take this up as of now. |
Is your feature request related to a problem? Please describe.
#1179 aims to build resource tracking framework for search queries. As a part of #3982, we have enabled resource tracking for shard level tasks. However, to support search back-pressure and to build a model for query cost estimation as discussed in #1042, we need coordinator level/query level resource consumption stats.
Describe the solution you'd like
We will piggyback the shard tasks' resource consumption along with the ClusterSearchShardsResponse from children nodes(data nodes) to the parent node(coordinator node). We need to change the response structure to accommodate the resource stats.
Describe alternatives you've considered
Another alternative we have considered is, rather than piggybacking the resource stats at the task completion, we can periodically share the resource stats from data nodes to the coordinator node. However, for query cost calculation, we do not need periodic stats from the data nodes. Moreover, sharing the resource consumption stats periodically will introduce overhead of new service running in the background to collect and share the data to the parent node.
Additional context
Just by looking at the resource consumption or aggregating the resource stats of child tasks, we cannot get the estimate of resource consumption of the coordinator task. Hence we cannot estimate whether a search task will cause the node go in duress or not and hence we do not need periodic resource stats from the data nodes.
The text was updated successfully, but these errors were encountered: