Remove implicit determinization from WildcardQuery by drempapis · Pull Request #15961 · apache/lucene

drempapis · 2026-04-15T07:07:13Z

This change continues the determinization cleanup started for regexp queries (#15939) by applying the same model to wildcard queries.

Previously, wildcard automata were implicitly determinized up front. After removing that implicit determinization, some code paths (especially query visiting/highlighting and equality checks) still assumed a DFA-only representation.

QueryVisitor.consumeTermsMatching() accepted only ByteRunAutomaton (DFA). After removing implicit wildcard determinization, many valid query automata are NFA-backed. Forcing them back into ByteRunAutomaton at visit time would reintroduce determinization cost and defeat the goal of this work.

This PR adds QueryVisitor.consumeTermsMatchingRunnable(), allowing visitor consumers (e.g. highlighter) to work with either DFA or NFA-backed execution via ByteRunnable, while keeping the existing consumeTermsMatching() API for compatibility.

…ion controls

drempapis · 2026-04-15T07:10:00Z

@rmuir can you please have a look?

romseygeek · 2026-04-15T09:13:48Z

        Query query, String field, Supplier<ByteRunAutomaton> automaton) {
      runAutomata.add(LabelledCharArrayMatcher.wrap(query.toString(), automaton.get()));
    }
+


I think we can just replace the Supplier<ByteRunAutomaton> with a Supplier<ByteRunnable> in the signature of consumeTermsMatching? It's a minimal change, replacing an implementation with an interface, and keeps the API surface on QueryVisitor low.

There's a further question around whether this should be taking something as complicated as a ByteRunnable in the first place, and should instead take a Predicate<BytesRef> but that's a bigger change and one that we can keep separate.

That's a good point. I updated the consumeTermsMatching signature applying Supplier<ByteRunnable>.

Regarding the second part, I agree that using a Predicate<BytesRef> should be handled in a separate PR.

rmuir · 2026-04-15T10:58:33Z

    int result = 1;
-    result = prime * result + ((runAutomaton == null) ? 0 : runAutomaton.hashCode());
-    result = prime * result + ((nfaRunAutomaton == null) ? 0 : nfaRunAutomaton.hashCode());
+    result = prime * result + normalAutomatonHashCode();


why did this code change?

Without that change I am getting errors of the like

java.lang.AssertionError: expected: org.apache.lucene.queries.intervals.IntervalQuery<f:MultiTerm(*.txt)> but was: org.apache.lucene.queries.intervals.IntervalQuery<f:MultiTerm(*.txt)> at __randomizedtesting.SeedInfo.seed([CA4FA9AC9103B4C8:8C2346BE162DAD34]:0) at junit@4.13.2/org.junit.Assert.fail(Assert.java:89) at junit@4.13.2/org.junit.Assert.failNotEquals(Assert.java:835) at junit@4.13.2/org.junit.Assert.assertEquals(Assert.java:120) at junit@4.13.2/org.junit.Assert.assertEquals(Assert.java:146) at org.apache.lucene.queries.intervals.TestIntervalQuery.testEquality(TestIntervalQuery.java:485)

in tests

- org.apache.lucene.queries.intervals.TestIntervalQuery.testEquality (:lucene:queries) - org.apache.lucene.queries.intervals.TestIntervals.testEquality (:lucene:queries) - org.apache.lucene.search.TestWildcardQuery.testEquals (:lucene:core)

CompiledAutomaton.equals() was updated so two NFA-backed NORMAL automata are considered equal when their automaton graphs are the same, even if they are different Java objects. That means hashCode() also had to change.

Previously, NFA hash code came from nfaRunAutomaton.hashCode(). So two objects that are now equals() could still produce different hash codes, which breaks the Java equals/hashCode contract.

The current update keeps existing DFA behavior (runAutomaton.hashCode()), and for NFA computes a structural hash from the underlying automaton graph using AutomatonStructuralComparator.structuralAutomatonHashCode()

Yes but I am trying to avoid the insertion of a whole new automaton class/java file. I can see where the old logic may have bugs, but maybe we can fix them and avoid that?

I've simplified the code by removing the new standalone comparator class and keeping the fix inside existing automaton classes. This update

Keeps CompiledAutomaton equality/hash behavior aligned with the original shape.

Adds structural equals/hashCode to Automaton.

Implements NFARunAutomaton.equals/hashCode to delegate to its wrapped Automaton.

Is there other way to fix equality-contract issue properly?

Its good. I like how the new hashcode/equals are defined... as "same automaton".

There were historical problems here around hashcode/equals being defined as "accepting same language"... which led to trouble.

rmuir · 2026-04-16T13:47:26Z

I'm a fan of this change: I feel like the hashcode/equals is the way it should have always worked!

I'll give it a few days for more feedback. Thank you for doing this work @drempapis

drempapis added 2 commits April 15, 2026 09:51

Removed WildcardQuery constructor overloads that exposed determinizat…

9a2574c

…ion controls

revert assertion

8b8aaae

github-actions bot added module:core/search module:highlighter module:benchmark module:queryparser module:queries labels Apr 15, 2026

Merge branch 'main' into wildcard-query-remove-det

a2b4c54

github-actions bot added this to the 11.0.0 milestone Apr 15, 2026

romseygeek reviewed Apr 15, 2026

View reviewed changes

update changes.txt

aca5c4b

rmuir reviewed Apr 15, 2026

View reviewed changes

drempapis added 4 commits April 15, 2026 14:47

Merge branch 'main' into wildcard-query-remove-det

fc196fd

Simplify code structure related to NFAAutomaton

52beaab

Update consumeTermsMatching signature using Supplier<ByteRunnable>

dfe9f58

remove redundant code

2f49f21

rmuir approved these changes Apr 16, 2026

View reviewed changes

rmuir requested review from dweiss and mikemccand April 16, 2026 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove implicit determinization from WildcardQuery#15961

Remove implicit determinization from WildcardQuery#15961
drempapis wants to merge 8 commits intoapache:mainfrom
drempapis:wildcard-query-remove-det

drempapis commented Apr 15, 2026

Uh oh!

drempapis commented Apr 15, 2026

Uh oh!

romseygeek Apr 15, 2026

Uh oh!

drempapis Apr 16, 2026

Uh oh!

rmuir Apr 15, 2026

Uh oh!

drempapis Apr 15, 2026

Uh oh!

rmuir Apr 15, 2026

Uh oh!

drempapis Apr 16, 2026

Uh oh!

rmuir Apr 16, 2026

Uh oh!

rmuir commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

drempapis commented Apr 15, 2026

Uh oh!

drempapis commented Apr 15, 2026

Uh oh!

romseygeek Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

drempapis Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

rmuir Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

drempapis Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

rmuir Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

drempapis Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

rmuir Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

rmuir commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants