-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add a cluster setting to disallow expensive queries #51385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-search (:Search/Search) |
bb294f8
to
3a219ca
Compare
Add a new cluster setting `search.disallow_slow_queries` which by default is `false`. If set to `true` then certain queries (prefix, fuzzy, regexp and wildcard) that have usually slow performance cannot be executed and an exception is thrown. Closes: elastic#29050
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new setting looks good, although I am still not decided whether we should add more granularity to the detection of slow queries. One possibility would be to have some static limits, like number of characters in the prefix query, wildcard with leading wildcards, ... Another possibility could be to hook into the rewrite of multi-terms to limit the expansion to few terms but the first option might be enough for most of the cases.
Should we also add the exception for script
and script_score
queries ? We could also try to restrict them only if they are the only required clause in the query. This would allow users to continue to use script
but only if they have another required clause based on inverted lists.
I'd also like @jpountz and @romseygeek to take a look to validate the approach and the limitations that we want to enforce in this new setting.
@@ -223,6 +226,8 @@ public IndexService( | |||
this.globalCheckpointTask = new AsyncGlobalCheckpointTask(this); | |||
this.retentionLeaseSyncTask = new AsyncRetentionLeaseSyncTask(this); | |||
updateFsyncTaskIfNecessary(); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove change
docs/reference/query-dsl.asciidoc
Outdated
<<query-dsl-regexp-query,`regexp`>> and <<query-dsl-bool-query,`wildcard`>> , | ||
that are usually slow performance can affect the cluster performance. | ||
The execution of such queries can be prevented by setting the value of the `search.disallow_slow_queries` | ||
setting to `true` (defaults tp `false`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/tp/to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general suggestion: can we use positive logic instead of double negative? ie have the setting be allow-slow-queries
with a default of true
.
Putting aside @rjernst's thoughts about the double negative, I have some general concerns about the name. The problem that I have is that this setting does not forbid slow queries (i.e., queries that from a user-facing perspective are taking a long period of time, which then get aborted if this setting is set) but rather queries that are "expensive" to execute, because they scale poorly. Perhaps the naming could reflect that, like |
I agree with both suggestions regarding the negative naming and the |
Currently I have added integration tests for each one of the disallowed queries. For the Script/ScriptScore it's not possible to include them in the yml test since the required ScriptPlugins are not loaded for those tests, so instead I added a test method in the corresponding java After some discussion I had with @rjernst, he suggested to just have one integ test, since the path of updating/passing the setting down to the QueryShardContext is the same for all the queries. @jimczi @jpountz @romseygeek What's your opinion? |
@elasticmachine run elasticsearch-ci/2 |
@rjernst @jasontedor Regarding the negative name and semantics, I have to add that the idea is as a next step to have more fine-grain control on the disallowed queries, so something like |
I think allow/disallow should really be a boolean. If we want the user to control the exact queries allowed, it should be a different setting. I also think an inclusive setting, while more verbose, is much simpler for a user to understand and less error prone in the future. If the list setting has negative semantics, new queries which we deem expensive may automatically be included on upgrade when the user already decided which expensive queries to allow. It is also more direct for a user or support to look at an inclusive setting and know what is allowed, instead of needing to consult code or documentation on what the default list of expensive queries are and then mentally remove the values of this setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts about this PR:
- Since I initially opened the issue that this PR addresses, the company seems to have embraced slow operations rather than prevented them (see e.g. searchable snapshots and the async search API) so I wonder whether this is still something we want to do. cc @giladgal
- Regardless I think there is a theme around slow queries and we need to provide users with a better experience:
- Make multi-term queries honor the
timeout
setting and task cancellation by leveragingExitableDirectoryReader
. - Better warn users when they run slow queries that have faster alternatives, such as running prefix queries on a text field that doesn't have
index_prefixes
enabled. There aren't many examples right now, but we expect several ones to come, e.g. when we enable querying numeric fields that have doc values but are not indexed, when we add scripted fields, or when thewildcard_keyword
field comes out.
- Make multi-term queries honor the
docs/reference/query-dsl.asciidoc
Outdated
<<query-dsl-regexp-query,`regexp`>> and <<query-dsl-bool-query,`wildcard`>> , | ||
that are usually slow performance can affect the cluster performance. | ||
The execution of such queries can be prevented by setting the value of the `search.disallow_slow_queries` | ||
setting to `true` (defaults to `false`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to expand a bit more here on what qualifies as a slow query. This will help users understand why some queries are protected by this setting while other queries are not. And this will also help make a decision whether a query qualifies as slow or not as we add more queries in the future:
- Queries that need to do linear scans to identify matches:
- script queries
- Queries that have a high up-front cost:
- fuzzy queries
- prefix queries without
index_prefixes
- wildcard queries
- range queries on keyword fields
- join queries
- queries on 6.x geo shapes
- Queries that may have a high per-document cost
- percolate queries
I think the PR still makes sense. Giving users the tools to run resource-draining slow queries can make it easier for users to get themselves into troubles, which only makes such a setting more important as a safety mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments, apart from that the change looks good to me.
docs/reference/query-dsl.asciidoc
Outdated
Those queries can be categorised as follows: | ||
* Queries that need to do linear scans to identify matches: | ||
** <<query-dsl-script-query, `script queries`>> | ||
** <<query-dsl-script-score-query, `script score queries`>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting this one to be fine since it finds matches using another query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a suggestion by @jimczi to also include those, since I guess the custom score calculation could decrease performance ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how this can be true with a complex score function. Let's move it to the Queries that may have a high per-document cost
section, since it's about scoring, not matching?
final AnalysisRegistry analysisRegistry, | ||
final EngineFactory engineFactory, | ||
final Map<String, IndexStorePlugin.DirectoryFactory> directoryFactories) { | ||
this(indexSettings, analysisRegistry, engineFactory, directoryFactories, () -> true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we have tests call the other constructor instead and pass ()->true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's easy to do.
scriptService, xContentRegistry, namedWriteableRegistry, client, searcher, nowInMillis, indexNameMatcher, | ||
new Index(RemoteClusterAware.buildRemoteIndexName(clusterAlias, indexSettings.getIndex().getName()), | ||
indexSettings.getIndex().getUUID()), isAllowExpensiveQueries); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid duplicating constructors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are ~20 usages of this in tests, so no big deal, I can remove the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit around naming, but LGTM otherwise
@@ -192,6 +197,10 @@ public BitSetProducer bitsetFilter(Query filter) { | |||
return bitsetFilterCache.getBitSetProducer(filter); | |||
} | |||
|
|||
public boolean isAllowExpensiveQueries() { | |||
return isAllowExpensiveQueries.getAsBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be allowExpensiveQueries()
? Otherwise it reads very strangely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of documentation changes to make it read slightly more naturally.
docs/reference/query-dsl.asciidoc
Outdated
|
||
[[query-dsl-allow-expensive-queries]] | ||
Allow expensive queries:: | ||
Execution of certain types of queries have usually slow performance, which can affect the cluster performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certain types of queries will generally execute slowly due to the way they are implemented, which can affect the stability of your cluster.
[[prefix-query-allow-expensive-queries]] | ||
===== Allow expensive queries | ||
Prefix queries will not be executed if <<query-dsl-allow-expensive-queries, `search.allow_expensive_queries`>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if <<index-prefixes, index_prefixes
>> are enabled, an optimised query is built which is not considered slow, and will be executed in spite of this setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment about the docs, LGTM otherwise.
docs/reference/query-dsl.asciidoc
Outdated
Those queries can be categorised as follows: | ||
* Queries that need to do linear scans to identify matches: | ||
** <<query-dsl-script-query, `script queries`>> | ||
** <<query-dsl-script-score-query, `script score queries`>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how this can be true with a complex score function. Let's move it to the Queries that may have a high per-document cost
section, since it's about scoring, not matching?
Add a new cluster setting `search.allow_expensive_queries` which by default is `true`. If set to `false`, certain queries that have usually slow performance cannot be executed and an error message is returned. - Queries that need to do linear scans to identify matches: - Script queries - Queries that have a high up-front cost: - Fuzzy queries - Regexp queries - Prefix queries (without index_prefixes enabled - Wildcard queries - Range queries on text and keyword fields - Joining queries - HasParent queries - HasChild queries - ParentId queries - Nested queries - Queries on deprecated 6.x geo shapes (using PrefixTree implementation) - Queries that may have a high per-document cost: - Script score queries - Percolate queries Closes: #29050 (cherry picked from commit a8b39ed)
cluster.get_settings: | ||
flat_settings: true | ||
|
||
- match: {search.allow_expensive_queries: null} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heya @matriv I think this assertion is problematic, it works with the java runner due to the semantics of HashMap.get , but it may fail with test runners written in other languages. I believe that the setting is not returned, is it? I was chatting to @karmi about this and the proper way to do this null check would be is_false: search.allow_expensive_queries . Would you mind please changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, this fails in the Go client runner, what works is is_false: { search.allow_expensive_queries: null }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does is_false accept field values like match? I thought the right syntax would be is_false: search.allow_expensive_queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_false: search.allow_expensive_queries
works with Java.
is_false: { search.allow_expensive_queries: null }
doesn't and also seems semantically wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, sorry — is_false: search.allow_expensive_queries
is correct.
heya, we were looking with @nik9000 at failing queries against runtime fields (under development in a feature branch) when expensive queries are disallowed. Looking at how the check is performed, we noticed that it is currently executed on each shard, which translates to each shard returning the same error (which then get de-duplicated in the coordinating node). Shall we move these checks to the coordinating node? Ideally, expensive queries would be rejected straight-away before being sent to the shards as part of the query phase. |
@javana Jim reminded me of the reason we didn't do it in the first place: Currently the mapping is resolved on every shard and for example: a prefix query on a field with some options enabled can be turned into a simple term query so with the proposed approach we would return an error on queries that we accept today (because of the rewrite simplification) when the option is enabled. Maybe, in the future we could enhance this behaviour by resolving the mapping the co-ordinating node, possibly using |
Add a new cluster setting
search.allow_expensive_queries
which bydefault is
true
. If set tofalse
, certain queries that haveusually slow performance cannot be executed and an error message
is returned.
Closes: #29050