Skip to content

Conversation

@laminelam
Copy link
Contributor

@laminelam laminelam commented Aug 5, 2025

Implements a new ShardDocSortBuilder and ShardDocFieldComparatorSource to allow sorting by shard id and global doc id as a tie-breaker.

Resolves 17064

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Lamine Idjeraoui added 2 commits August 5, 2025 11:08
- Implements ShardDocSortBuilder + comparator
- TODO: Add unit + integ tests
- Registers in SearchModule

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2025

❕ Gradle check result for 7199347: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 61.17647% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.95%. Comparing base (eb28e77) to head (60f438d).
⚠️ Report is 46 commits behind head on main.

Files with missing lines Patch % Lines
...rch/search/sort/ShardDocFieldComparatorSource.java 13.63% 19 Missing ⚠️
...rg/opensearch/search/sort/ShardDocSortBuilder.java 78.04% 7 Missing and 2 partials ⚠️
...va/org/opensearch/action/search/SearchRequest.java 81.25% 1 Missing and 2 partials ⚠️
...nsearch/search/searchafter/SearchAfterBuilder.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18924      +/-   ##
============================================
+ Coverage     72.81%   72.95%   +0.14%     
- Complexity    69801    69925     +124     
============================================
  Files          5674     5676       +2     
  Lines        320850   320935      +85     
  Branches      46383    46398      +15     
============================================
+ Hits         233638   234153     +515     
+ Misses        68264    67842     -422     
+ Partials      18948    18940       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Aug 6, 2025
@LantaoJin
Copy link
Member

@laminelam can we add any microbenchmark for it?

@github-actions
Copy link
Contributor

❌ Gradle check result for a719fb7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for a719fb7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for a719fb7: SUCCESS

@gaobinlong
Copy link
Contributor

Thanks @laminelam, this PR is very close to merge, but seems the code test coverage is low, could you add more some unit tests?

@LantaoJin
Copy link
Member

Thanks @laminelam , we are preparing the new release version (3.3.0) which the planned code freeze is Sep 30. Do you have a chance to fix the test coverage check ASAP?

@laminelam
Copy link
Contributor Author

Thanks @laminelam , we are preparing the new release version (3.3.0) which the planned code freeze is Sep 30. Do you have a chance to fix the test coverage check ASAP?

Hi @LantaoJin & @gaobinlong
Yes sure, gimme a day or two.
Thx for the review

@ivenhov
Copy link

ivenhov commented Sep 26, 2025

Hi
Any change this going to be backported to 2.19 ?

add more test cases

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 60f438d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@LantaoJin
Copy link
Member

LantaoJin commented Sep 28, 2025

@laminelam, will all PIT search requests add an implicit _shard_doc sort tiebreaker field with this PR?

Example 1:

GET /_search
{
  "size": 10000,
  "query": {
    "match" : {
      "user.id" : "elkbee"
    }
  },
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "100m"
  },
  "sort": [ 
    {"@timestamp": {"order": "asc"}}
  ]
}

will sort by @timestamp (asc) + _shard_doc(asc). returns:

{
  "pit_id" : "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
  "took" : 17,
  "timed_out" : false,
  "_shards" : ...,
  "hits" : {
    "total" : ...,
    "max_score" : null,
    "hits" : [
      ...
      {
        "_index" : "my-index-000001",
        "_id" : "FaslK3QBySSL_rrj9zM5",
        "_score" : null,
        "_source" : ...,
        "sort" : [
          "2021-05-20T05:30:04.832Z",
          4294967298
        ]
      }
    ]
  }
}

Example 2

GET /_search
{
  "size": 10000,
  "query": {
    "match" : {
      "user.id" : "elkbee"
    }
  },
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "100m"
  },
  "sort": []
}

will sort by _shard_doc(asc), returns

  "pit_id" : "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
  "took" : 17,
  "timed_out" : false,
  "_shards" : ...,
  "hits" : {
    "total" : ...,
    "max_score" : null,
    "hits" : [
      ...
      {
        "_index" : "my-index-000001",
        "_id" : "FaslK3QBySSL_rrj9zM5",
        "_score" : null,
        "_source" : ...,
        "sort" : [
          4294967298
        ]
      }
    ]
  }
}

@laminelam
Copy link
Contributor Author

laminelam commented Sep 28, 2025

@laminelam, will all PIT search requests add an implicit _shard_doc sort tiebreaker field with this PR?

Thought of doing it, actually did it and reverted it back, because I am little bit hesitant to implicitly tie a stable proven feature (PIT) to a newly introduced one (shard_doc sorting) in a single PR. I am thinking we better give the user the option to use PIT without shard_doc and at some point we can make it implicit in a separate PR. I think this is a safest path to avoid breaking anything and also make it backwards compatible.

What are your thoughts?

@LantaoJin
Copy link
Member

@laminelam, will all PIT search requests add an implicit _shard_doc sort tiebreaker field with this PR?

Thought do it, actually did it and reverted it back, because I am little bit hesitant to implicitly tie a stable proven feature (PIT) to a newly introduced one (shard_doc sorting) in a single PR. I am thinking we better give the user the option to use PIT without shard_doc and at some point we can make it implicit in a separate PR. I think this is a safest path to avoid breaking anything and also make it backwards compatible.

What are your thoughts?

Make sense, LGTM.

@gaobinlong
Copy link
Contributor

Hi Any change this going to be backported to 2.19 ?

New feature or enhancement won't be backported to 2.19, only bug fix and security issue fix PR can be backported.

@github-actions
Copy link
Contributor

❕ Gradle check result for 60f438d: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@gaobinlong gaobinlong merged commit 47aa8ee into opensearch-project:main Sep 30, 2025
33 of 36 checks passed
asimmahmood1 pushed a commit to asimmahmood1/OpenSearch that referenced this pull request Sep 30, 2025
…ect#18924)

* add a new sort tiebreaker based on shard id and docid

- Implements ShardDocSortBuilder + comparator
- TODO: Add unit + integ tests
- Registers in SearchModule

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add a new sort tiebreaker based on shard id and docid

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add a test class plus some refactoring

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add microbenchmark

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* spotless

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* spotless

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add yaml rest test cases

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* change log

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* some changes to the shard_doc yaml rest test file

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* Add PIT param to ShardDocFieldComparatorSourceIT's search requests

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add more logic for shard_doc parsing
add more test cases

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

---------

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Co-authored-by: Lamine Idjeraoui <lidjeraoui@apple.com>
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Oct 15, 2025
…ect#18924)

* add a new sort tiebreaker based on shard id and docid

- Implements ShardDocSortBuilder + comparator
- TODO: Add unit + integ tests
- Registers in SearchModule

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add a new sort tiebreaker based on shard id and docid

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add a test class plus some refactoring

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add microbenchmark

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* spotless

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* spotless

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add yaml rest test cases

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* change log

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* some changes to the shard_doc yaml rest test file

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* Add PIT param to ShardDocFieldComparatorSourceIT's search requests

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

* add more logic for shard_doc parsing
add more test cases

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

---------

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Co-authored-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Elastic Search _shard_doc equivalent

4 participants