Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Tie breaker for search with search_after pagination #11831

Open
Arpit-Bandejiya opened this issue Jan 10, 2024 · 2 comments
Open
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities

Comments

@Arpit-Bandejiya
Copy link
Contributor

Arpit-Bandejiya commented Jan 10, 2024

Is your feature request related to a problem? Please describe

When we do the sorting by datetime and have the recurrent values.

#1 "2024-01-03 19:57:38"
#2 "2024-01-03 19:57:38"
...
#3 "2024-01-04 19:57:39"
#4 "2024-01-04 19:57:39"

This makes leaks of the docs while paginating using search_after parameter. According to the dataset above imagine first 10K docs ends with #1 value, so the next 10K will start with #3. #2 is missed.

This feature is requested by other users as well: https://stackoverflow.com/questions/76042569/can-i-imitate-a-tie-breaker-field-in-opensearch-with-search-after-pagination

Describe the solution you'd like

We need to introduce an default tie_breaker_fields for the PIT with search_after.

Related component

Search:Query Capabilities

Describe alternatives you've considered

No response

Additional context

No response

@Arpit-Bandejiya Arpit-Bandejiya added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 10, 2024
@msfroh
Copy link
Collaborator

msfroh commented Jan 10, 2024

@Arpit-Bandejiya -- Does this work if you sort by timestamp and _id, then search_after with both? That should provide a unique sort, right?

In theory, I suppose we could add _id as an implicit tie breaker.

@msfroh msfroh removed the untriaged label Jan 17, 2024
@bharath-techie
Copy link
Contributor

Hi @msfroh ,
By default, many users don't seem to index '_id' as a different doc values field. So , they get loaded as field data and has a impact on heap usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

3 participants