Closed
Description
This feature is largely about building good approximation queries on the ngram index to limit the number of documents that need verification using an automaton built from the regex.
Lucene's Regexp.toStringTree() method gives a good template for walking a parsed regex query's logic. Rather than building a string we can do something similar which builds an approximation BooleanQuery on the 3gram index. This logic will have to walk a line between:
- Being selective enough to efficiently narrow the set of documents considered and
- Avoid being overly-restrictive and introducing false negatives (ignoring docs that should match).