Skip to content

feat: Make elastic TermsQuery use TermSetQuery internally#6151

Open
Darkheir wants to merge 2 commits intoquickwit-oss:mainfrom
Darkheir:feat/es_terms_use_term_set_internally
Open

feat: Make elastic TermsQuery use TermSetQuery internally#6151
Darkheir wants to merge 2 commits intoquickwit-oss:mainfrom
Darkheir:feat/es_terms_use_term_set_internally

Conversation

@Darkheir
Copy link
Contributor

Description

Rely on TermSetQuery instead of a bool should query with many criteria on the same field as it should improve performances.

How was this PR tested?

Integration tests should still pass

Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@rdettai-sk
Copy link
Collaborator

Small clarification: we don't expect performances to be always improved, but we believe mapping the TermSet behavior here (warmup of the whole field) makes more sense. Performances will improve drastically when the list is long as the number of S3 queries generated by the warmup explodes. It might be a bit worst if there are only a couple terms in the list.

Ideally we could imagine an optimisation during the warmup to have something in between warming up individual terms and the whole field.

@rdettai-sk
Copy link
Collaborator

Test 0028-fast_only_field_query.yaml fails because TermSet queries seem to not be supported on fast fields.

@Darkheir
Copy link
Contributor Author

it might be a bit worst if there are only a couple terms in the list.

Should we add some logic to use a bool query when there are only few terms and a term set once a threshold is reached ?

TermSet queries seem to not be supported on fast fields.

It seems that tantivy's TermSetQuery doesn't support it, but TermQuery has a fallback for those fields. I could still fallback to a bool query for non-indexed fast fields.

Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@trinity-1686a
Copy link
Contributor

Should we add some logic to use a bool query when there are only few terms and a term set once a threshold is reached ?

that would probably be for the better, though idk what the cutoff should be at all.

It seems that tantivy's TermSetQuery doesn't support it, but TermQuery has a fallback for those fields. I could still fallback to a bool query for non-indexed fast fields.

TermSetQuery on fastfield has the potential to always be faster than multiple TermQuery on fastfield, it would be sad to not leverage that. if you don't mind, i think i'd rather we support termsetquery over fastfield, and then merge your PR so there is no feature regression, but we still get the performance gain

Copy link
Contributor

@trinity-1686a trinity-1686a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll keep that unmerged for a few days and see if i can find the time to improve tantivy, otherwise we'll merge that, and improve tantivy+quickwit eventually

#[test]
fn test_term_set_query_with_fast_only_field_returns_bool_query() {
let mut schema_builder = Schema::builder();
schema_builder.add_u64_field("fast_field", FAST);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep things that way, i think a test with mixed "one fast field and one indexed field" would be nice to have
given this is a workaround and we may update tantivy so it's no longer needed, it's not really needed though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants