-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Poor performance of regex queries #5097
Comments
@Jon-AtAWS Do you have any suggestions of how we could express the order of filtering we'd like in this query to let it happen both ways? |
So, let's say this is part of a larger query, like (p-code, excuse the lack of syntax)
And, let's say I know that the time range and field filter will reduce the match set to a few hundred docs, vs. evaluating the cost of the regex as part of the query planning. Just brainstorming...
We can also work on how Lucene approximates regex queries, to increase their weight an order of magnitude or 2. |
This may benefit from #7057. Alternatively, it might have already improved in newer versions of Lucene, since (IIRC) it no longer precomputes a bitset of all matching docs across all fields. |
The release of search pipelines fixes this problem in one way. We should evaluate the cost of splitting out filters like the above and managing pipeline flow vs. pushing them all into a single query and having Lucene optimize. |
We're also adding support for the wildcard field type to make these queries better (if you plan your mappings around wildcard matching): #5639 |
Describe the bug
The following query:
Is inefficient and runs for many seconds, especially when there are many fields in the index. When there are also many indices in the query, this can take down the cluster.
Expected behavior
This is one clause of a larger query, but because of the way queries are processed, there's no way to force the filtering to occur first. In this case, we can't use a post-filter, because we need accurate values in the aggregations.
We'd like to reduce the overall cost of running the query.
The text was updated successfully, but these errors were encountered: