Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For sort request on timeseries field use non concurrent search path #9562

Merged
merged 2 commits into from
Aug 28, 2023

Conversation

sohami
Copy link
Collaborator

@sohami sohami commented Aug 25, 2023

Related Issues

#9538

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@sohami
Copy link
Collaborator Author

sohami commented Aug 25, 2023

@gashutos @jed326 @reta Tagging for review

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
@reta
Copy link
Collaborator

reta commented Aug 28, 2023

@sohami @jed326 @gashutos this is not in scope of this change, but I was thinking if we better off introducing the concept of search capabilities instead of adding flags to the search context, something like that (very roughly):

enum SearchCapability {
    CONCURRENT,
    USE_TIMESERIES_DESC,
    ...
}

if (context.shouldUse(SearchCapability.CONCURRENT)) {
   ...
}

What do you think folks?

@gashutos
Copy link
Contributor

@sohami @jed326 @gashutos this is not in scope of this change, but I was thinking if we better off introducing the concept of search capabilities instead of adding flags to the search context, something like that (very roughly):

enum SearchCapability {
    CONCURRENT,
    USE_TIMESERIES_DESC,
    ...
}
if (context.shouldUse(SearchCapability.CONCURRENT)) {
   ...
}

What do you think folks?

+1 to this @reta We should able to chose multiple capabilities on a query too

@sohami
Copy link
Collaborator Author

sohami commented Aug 28, 2023

@sohami @jed326 @gashutos this is not in scope of this change, but I was thinking if we better off introducing the concept of search capabilities instead of adding flags to the search context, something like that (very roughly):

enum SearchCapability {
    CONCURRENT,
    USE_TIMESERIES_DESC,
    ...
}
if (context.shouldUse(SearchCapability.CONCURRENT)) {
   ...
}

What do you think folks?

+1 to this @reta We should able to chose multiple capabilities on a query too

@reta I see your point. But for cases like timeseries asc/desc, we are evaluating based on other states (like sort/indexshard) and not storing that evaluated result as another flag. I would prefer to not store that evaluated result separately as well to avoid creating new states. Having said that, the request level evaluation for concurrent search can also be implemented without keeping separate state but I think we did that to avoid the agg factory tree traversal in each call.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@jed326
Copy link
Collaborator

jed326 commented Aug 28, 2023

I think we did that to avoid the agg factory tree traversal in each call.

We could probably change the agg factory tree concurrent search evaluation to happen at the time of tree construction too if we want to remove this state.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Aug 28, 2023

Codecov Report

Merging #9562 (7929ae9) into main (569d5c2) will decrease coverage by 0.08%.
Report is 2 commits behind head on main.
The diff coverage is 63.15%.

@@             Coverage Diff              @@
##               main    #9562      +/-   ##
============================================
- Coverage     71.15%   71.07%   -0.08%     
+ Complexity    57536    57499      -37     
============================================
  Files          4781     4781              
  Lines        271197   271216      +19     
  Branches      39595    39596       +1     
============================================
- Hits         192975   192779     -196     
- Misses        62011    62275     +264     
+ Partials      16211    16162      -49     
Files Changed Coverage Δ
.../opensearch/telemetry/tracing/noop/NoopTracer.java 57.14% <0.00%> (-9.53%) ⬇️
...ensearch/search/internal/ContextIndexSearcher.java 73.21% <0.00%> (+4.11%) ⬆️
...nsearch/search/internal/FilteredSearchContext.java 8.69% <0.00%> (-0.08%) ⬇️
.../org/opensearch/search/internal/SearchContext.java 41.50% <ø> (ø)
...va/org/opensearch/search/DefaultSearchContext.java 77.97% <40.00%> (-0.92%) ⬇️
...rg/opensearch/telemetry/tracing/WrappedTracer.java 78.57% <50.00%> (-6.05%) ⬇️
...ava/org/opensearch/search/sort/SortAndFormats.java 66.66% <66.66%> (ø)
...elemetry/tracing/OTelTracingContextPropagator.java 73.33% <70.00%> (-1.67%) ⬇️
...rg/opensearch/telemetry/tracing/DefaultTracer.java 93.10% <100.00%> (+0.51%) ⬆️
...search/search/query/QueryPhaseSearcherWrapper.java 100.00% <100.00%> (ø)
... and 1 more

... and 473 files with indirect coverage changes

@reta
Copy link
Collaborator

reta commented Aug 28, 2023

@reta I see your point. But for cases like timeseries asc/desc, we are evaluating based on other states (like sort/indexshard) and not storing that evaluated result as another flag.

Thanks @sohami , I am intentionally not digging into how it is evaluated, just looking on the issue from usability perspective:

public boolean shouldUseConcurrentSearch() {
}

public boolean shouldUseTimeSeriesDescSortOptimization() {
}

...

would collapse into context.shouldUse(SearchCapability.CONCURRENT) or context.shouldUse(SearchCapability.USE_TIMESERIES_DESC), the evaluation part we could sort out I am sure.

@sohami
Copy link
Collaborator Author

sohami commented Aug 28, 2023

I think we did that to avoid the agg factory tree traversal in each call.

We could probably change the agg factory tree concurrent search evaluation to happen at the time of tree construction too if we want to remove this state.

But that will still require to store the evaluated result in some other place which is similar to what we have today.

@sohami
Copy link
Collaborator Author

sohami commented Aug 28, 2023

@reta I see your point. But for cases like timeseries asc/desc, we are evaluating based on other states (like sort/indexshard) and not storing that evaluated result as another flag.

Thanks @sohami , I am intentionally not digging into how it is evaluated, just looking on the issue from usability perspective:

public boolean shouldUseConcurrentSearch() {
}

public boolean shouldUseTimeSeriesDescSortOptimization() {
}

...

would collapse into context.shouldUse(SearchCapability.CONCURRENT) or context.shouldUse(SearchCapability.USE_TIMESERIES_DESC), the evaluation part we could sort out I am sure.

Got it. So instead of separate methods to figure out what optimizations to use. Have a single shouldUse() sort of semantics. But won't that overload a single method with multiple different evaluations which can make this logic complex (as one can be dependent on other)?. With separate methods IMO it is just checking for that particular condition which can be consumed from different places and doesn't care about other evaluations.

@reta
Copy link
Collaborator

reta commented Aug 28, 2023

But won't that overload a single method with multiple different evaluations which can make this logic complex (as one can be dependent on other)?. With separate methods IMO it is just checking for that particular condition which can be consumed from different places and doesn't care about other evaluations.

I think you are still thinking about the implementation (it is important but not in scope): the method checks single capability, says "yes" / "no". From implementation perspective, if it goes wild - we won't be doing it for sure.

@sohami
Copy link
Collaborator Author

sohami commented Aug 28, 2023

For gradle check seeing an unrelated test failure: org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit
Created issue for tracking: #9589

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sohami
Copy link
Collaborator Author

sohami commented Aug 28, 2023

For compatibility check it is failing with CCR plugin with below error. Checked couple other PRs and there are also seeing same issue.

Error output for https://github.com/opensearch-project/cross-cluster-replication.git build:

e: file:///tmp/groovy-generated-tmpdir-7692019513789344670/src/main/kotlin/org/opensearch/replication/repository/RemoteClusterRepository.kt:82:1 Class 'RemoteClusterRepository' is not abstract and does not implement abstract member public abstract fun getRemoteUploadThrottleTimeInNanos(): Long defined in org.opensearch.repositories.Repository

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileKotlin'.
> A failure occurred while executing org.jetbrains.kotlin.compilerRunner.GradleCompilerRunnerWithWorkers$GradleKotlinCompilerWorkAction
   > Compilation error. See log for more details

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sohami sohami added the backport 2.x Backport to 2.x branch label Aug 28, 2023
@sohami sohami merged commit e5c4f9d into opensearch-project:main Aug 28, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9562-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e5c4f9d2016bfa50c464945dc9533aa4aee49922
# Push it to GitHub
git push --set-upstream origin backport/backport-9562-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9562-to-2.x.

sohami added a commit to sohami/OpenSearch that referenced this pull request Aug 29, 2023
…pensearch-project#9562)

* For sort request on timeseries field use non concurrent search path

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

* Address review feedback

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

---------

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
sohami added a commit that referenced this pull request Aug 29, 2023
…9562) (#9599)

* For sort request on timeseries field use non concurrent search path



* Address review feedback



---------

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…pensearch-project#9562)

* For sort request on timeseries field use non concurrent search path

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

* Address review feedback

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

---------

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…pensearch-project#9562)

* For sort request on timeseries field use non concurrent search path

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

* Address review feedback

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

---------

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…pensearch-project#9562)

* For sort request on timeseries field use non concurrent search path

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

* Address review feedback

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

---------

Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Concurrent Segment Search] For sort request on Timeseries index and field use non concurrent path
4 participants