Skip to content

IndexOrDocValuesQuery and IndexSortSortedNumericDocValuesRangeQuery should only be counted once when computing maxClauseCount #14759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Jun 5, 2025

see #14756

The idea proposed here is that both queries are just leaf queries and should only call once the method QueryVisitor#visitLeaf. For IndexOrDocValuesQuery is a bit more complicated because the query does not contain the field that is being queried, therefore it delegates the registration of the query to the visitor to the index query.

closes #14756

…hould only be counted once when computing maxClauseCount
Copy link

github-actions bot commented Jun 5, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@iverase
Copy link
Contributor Author

iverase commented Jun 9, 2025

@romseygeek sorry for the direct ping but as you were involved in the original development of the query visitor API, do you have any opinions here? what is a leave in the query tree?

@romseygeek
Copy link
Contributor

QueryVisitor is supposed to be a query introspection API, so I wouldn't change its behaviour here. If the problem is that we want to be able to query multiple ranges without creating giant boolean queries, then I'd suggest instead looking at adding a newSetRangeQuery method to LongField, and maybe promoting LongMultiRangeBuilder from the sandbox?

@iverase
Copy link
Contributor Author

iverase commented Jun 9, 2025

My problem is that I want to use IndexOrDocValues query without changing the behaviour of the queries. This is a bit weird as the benefit of that query happens when it used on boolean queries but with the current implementation it is bogus as the behaviour change, e.g. it is not an optimisation anymore.

For what you say, then the way we compute maxClauseCount is bogus and we should not be using the QueryVisitor?

@romseygeek
Copy link
Contributor

Well it's still an optimization, but just at the cost of a bigger query footprint.

I think there are a couple of things pulling in different directions here:

  • IndexOrDocValuesQuery trades off in-memory complexity for better query-time performance
  • We have the too many clauses check to push people towards using queries that use less memory, eg TermInSetQuery instead of very large disjunctions of TermQuery

So what you're running into there is the limit of that trade-off. I don't think anything is wrong here, I think this is telling us that if you want to do a disjunction over ranges then past a certain point just building a boolean query is going to be very inefficient and you should use a different type of query. Maybe one that we don't actually provide yet!

@iverase
Copy link
Contributor Author

iverase commented Jun 9, 2025

I think the current behaviour is trappy, I would expect the test I post on the issue to pass. I agree though that building those big boolean queries is not optimal, but that's what users do sometimes.

@iverase
Copy link
Contributor Author

iverase commented Jun 9, 2025

Maybe we should remove the concept of maxClauseCount as we are not counting those but something else, query complexity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexOrDocValuesQuery is counted twice when computing maxClauseCount
2 participants