-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ConcurrentQueryProfiler to profile query using concurrent segment search path and support concurrency during rewrite and create weight #10352
Conversation
ad5028a
to
d0a5071
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change 47a5802 Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git] |
05d05af
to
5b2919f
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
|
server/src/main/java/org/opensearch/search/profile/query/ConcurrentQueryProfileBreakdown.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Ticheng Lin <ticheng@amazon.com>
5b2919f
to
47a5802
Compare
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10352-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 200ad5d28a577877be530ecab507601898025c5c
# Push it to GitHub
git push --set-upstream origin backport/backport-10352-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
@ticheng-aws sadly backport to 2.x failed, could you please send a manual pull request? thank you |
@ticheng-aws We will need to backport all the previous PRs for concurrent profiler as well. I think during 2.11 we didn't do it. But lets do it now. Then it will go thru for this one as well. |
Oh wait , is it because we run into NPEs? |
…egment search path and support concurrency during rewrite and create weight (opensearch-project#10352) * Fix timer race condition in profile rewrite and create weight for concurrent segment search (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Refactor and work on the PR comments (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com>
You are right. We need to backport #9248, #10111, #10547, and the current one #10352. |
…egment search path and support concurrency during rewrite and create weight (opensearch-project#10352) * Fix timer race condition in profile rewrite and create weight for concurrent segment search (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Refactor and work on the PR comments (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com>
…egment search path and support concurrency during rewrite and create weight (opensearch-project#10352) * Fix timer race condition in profile rewrite and create weight for concurrent segment search (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Refactor and work on the PR comments (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com>
…egment search path and support concurrency during rewrite and create weight (opensearch-project#10352) * Fix timer race condition in profile rewrite and create weight for concurrent segment search (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Refactor and work on the PR comments (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com>
…#10898) * Add support for query profiler with concurrent aggregation (#9248) * Add support for query profiler with concurrent aggregation (#9248) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Refactor and work on the PR comments Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Update collectorToLeaves mapping for children breakdowns post profile metric collection and before creating the results Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> * Refactor logic to compute the slice level breakdown stats and query level breakdown stats for concurrent search case Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> * Fix QueryProfilePhaseTests and QueryProfileTests, and parameterize QueryProfilerIT with concurrent search enabled Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Handle the case when there are no leaf context to compute the profile stats to return default stats for all breakdown type along with min/max/avg values. Replace queryStart and queryEnd time with queryNodeTime Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> * Add UTs for ConcurrentQueryProfileBreakdown Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> * Add concurrent search stats test into the QueryProfilerIT Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Address review comments Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com> Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> Co-authored-by: Sorabh Hamirwasia <sohami.apache@gmail.com> * Fix NPE in ConcurrentQueryProfile while computing the breakdown map for slices (#10111) * Fix NPE in ConcurrentQueryProfile while computing the breakdown map for slices. There can be cases where one or more slice may not have timing related information for its leaves in contexts map. During creation of slice and query level breakdown map it needs to handle such cases by using default values correctly. Also updating the min/max/avg sliceNodeTime to not include time to create weight and wait times by slice threads. It will reflect the min/max/avg execution time of each slice whereas totalNodeTime will reflect the total query time. Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> * Address review comments Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> --------- Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> * Fix flaky query profile phase tests with concurrent search enabled (#10547) (#10547) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Introduce ConcurrentQueryProfiler to profile query using concurrent segment search path and support concurrency during rewrite and create weight (#10352) * Fix timer race condition in profile rewrite and create weight for concurrent segment search (#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Refactor and work on the PR comments (#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com> Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com> Co-authored-by: Ticheng Lin <51488860+ticheng-aws@users.noreply.github.com>
…egment search path and support concurrency during rewrite and create weight (opensearch-project#10352) * Fix timer race condition in profile rewrite and create weight for concurrent segment search (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> * Refactor and work on the PR comments (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com> --------- Signed-off-by: Ticheng Lin <ticheng@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
The
rewrite_time
field contains the cumulative total rewrite time for a query. In the concurrent search case, the issue is “rewrite” happened on "before" (i.e. non concurrent) and "during" the concurrent path. There is only 1 rewrite timer shared across all slices, which lead the racing condition happened in the multiple threadings case.To resolve the issue, we need to have a distinct rewrite timer for each query, then compute the sum of the non-overlapping time across all slices and add the rewrite time from the non-concurrent path.
A similar race condition happened on building the profile tree stack during the create weight step. To address this issue, I added
threadToProfileTree
map into the profiler. This way, each thread has its own profile tree, ensuring that the stack won't access tree nodes from other threads. In the end, we simply combine all the trees into theprofileResults
list.Related Issues
Resolves #9787
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.