Add request parameter to block index auto creation #34737

vladimirdolzhenko · 2018-10-23T10:18:18Z

Node setting action.auto_create_index allows to create index implicitly on bulk/reindex/update requests. Some cases require that index to be created in advance while it is still handy to have it turned on for the rest cases.

For that purposes it is worth to have an option to disable automatic index creation on a request level.

This PR adds request parameter auto_create_index to index/update/delete requests, to disable automatic index creation it has to be set to false if it is enabled by action.auto_create_index.
Field auto_create_index is added to action and meta data of bulk request.

Note: parameter auto_create_index can never be set to true

elasticmachine · 2018-10-23T10:18:20Z

Pinging @elastic/es-core-infra

vladimirdolzhenko · 2018-10-23T10:18:58Z

Note: the name of parameter (disable_auto_create_index) is still under discussion.

vladimirdolzhenko · 2018-10-23T10:20:10Z

Relates to #34649: for the new custom exception message I include the index name into exception message.

…create_index_param

jasontedor · 2018-10-23T15:38:53Z

@vladimirdolzhenko Would you explain the motivation for proposing this change? I think that we have seen enhancements requests for this functionality before and turned them down (I will search for it later).

vladimirdolzhenko · 2018-10-23T15:44:03Z

@jasontedor It has been asked by @bleskes

jasontedor · 2018-10-23T15:52:39Z

@vladimirdolzhenko Okay, can you articulate what the need is?

bleskes · 2018-10-24T12:17:17Z

Current shippers like beats default to bootstrap their connections by adding an index templates and relying on automatic index creation to create indices based on those templates. This was introduced in order to address problems where people forgot to run the setup command and started indexing, only to later find out that their data is no good. This default behavior works well and have solved the problem for which it is created, but there still two issues on that front:

If beats doesn't ship data to ES (i.e., it ships to Kafka or logstash), people still need to bootstrap ES manually and they forget.
Some people rely on external provisioning systems to configure ES and disable beats' auto setup features. This sometimes goes wrong and they end up with data in ES they need to clean up.

The ability to make sure that a request never creates an index (i.e., I only want this data to be indexed, if possible) will help beats address these issues when people use ILM (where an index needs to be explicitly created with a write alias).

Next to the above, such a flag may simplify using index templates in use cases where a single index is used for storage (like kibana and security). We current add a template for the purpose of protecting against people deleting an existing index and an inflight indexing request from creating a new one without a mapping. This flag would allow to fail the request and fall back to explicit index creation.

Last, and this is still very much in the air, there is chance the configuring the templates/data environment will move away from the edge shippers to some other central point, which strengthens the problems mentioned above. This too far out to effect current decisions but just put this on the horizon.

bleskes · 2018-10-24T12:17:31Z

/cc @tsg

clintongormley · 2018-10-29T09:34:39Z

I think that we have seen enhancements requests for this functionality before and turned them down (I will search for it later).

We discussed it briefly in #23685. I have since come to the conclusion that this is the only clean way to avoid the race condition around putting the index template and indexing the first document. It also has the beneficial side effect of reducing the need for templates (in favour of direct index creation).

tomcallahan · 2018-11-05T14:54:36Z

Where do stand on this one? @vladimirdolzhenko @bleskes @jasontedor

jaymode · 2018-11-05T15:43:46Z

This change would also benefit a cleanup in security. Today we try to be smart and make sure the index exists with the correct mappings before issuing a write but this is not foolproof so we also have to handle some error cases such as an IndexNotFoundException. Ideally we would like to assume everything is fine, issue the write, and then handle failures. If it is an INFE then we could create the index with the correct mappings.

vladimirdolzhenko · 2018-11-05T16:28:38Z

@bleskes could you please have a look into code if we can discuss and choose the name of parameter later ?

jasontedor · 2018-11-07T13:08:54Z

Thank you for articulating the use-case @bleskes. I am leaning towards this but I have a clarifying question.

If beats doesn't ship data to ES (i.e., it ships to Kafka or logstash), people still need to bootstrap ES manually and they forget.

I think one thing missing for me in this case is how does the request flag to disable auto-index creation get set? It seems to me that it would require a new parameter on the Elasticsearch output plugin in Logstash that requires the user to manually set this?

Some people rely on external provisioning systems to configure ES and disable beats' auto setup features. This sometimes goes wrong and they end up with data in ES they need to clean up.

This case is clear. If Beats auto-setup is disabled (setup.template.enabled set to false), then Beats will send the request flag to disable auto-index creation on each request, and can log when it receives an index not-found exception caused by this flag.

tsg · 2018-11-07T14:10:06Z

If beats doesn't ship data to ES (i.e., it ships to Kafka or logstash), people still need to bootstrap ES manually and they forget.

I think one thing missing for me in this case is how does the request flag to disable auto-index creation get set? It seems to me that it would require a new parameter on the Elasticsearch output plugin in Logstash that requires the user to manually set this?

Logstash has an internal mechanism (called @metadata) to pass information from the input to the output. So the beats-input can set a flag in @metadata for each event/bulk, and the Elasticsearch output can set the option automatically if it sees this flag in @metadata. So setting this flag could be the default when getting data from the beats-input, or the Beat can set it (it's possible to send @metadata from the Beat as well, over lumberjack) if there are any reasons to believe that that bootstrap didn't happen.

Pinging @jsvd and @urso because this is their domain now.

vladimirdolzhenko · 2018-11-07T14:31:10Z

We've discussed in another channel and agreed to the name of parameter:
?auto_create_index=false, can never be set to true

urso · 2018-11-07T14:49:15Z

One can disable all bootstrapping in beats and users do so in favour of running <beat> setup separately. We can easily add a setting or use setup.template.enabled to send this flag when publishing to Logstash.

The flag is valid for all events in a bulk request, but we send @metadata per event to Logstash. A bulk request from Logstash can potentially contain multiple events from different beats or other inputs, some having the flag set and others not. I don't think we want to have a per-bulk-setting being based on potential conflicting per event settings in Logstash.

Actually when sending Beats events directly or indirectly via Logstash, it would make sense to always enable this flag. But again, this might surprise users mixing Beats + other input types.

bleskes · 2018-11-07T15:10:08Z

A bulk request from Logstash can potentially contain multiple events from different beats or other inputs, some having the flag set and others not. I don't think we want to have a per-bulk-setting being based on potential conflicting per event settings in Logstash.

@urso thanks for this insight and I fully agree. This can be solved both on the ES side (making this a per index request item parameter - which is something we really want to avoid to reduce the scope of the issue) and on the Logstash side (by making sure request never mix values of the relevant meta fields). Are you the right person to see what it would mean on the logstash output side?

tsg · 2018-11-07T15:51:48Z

If not @urso, then @jsvd is :)

jsvd · 2018-11-07T15:58:01Z

In logstash, mixing data from multiple sources in the same bulk request is a typical optimization we suggest to users: if possible use a single ES output and rely as much as possible on metadata to place events into different indices, figure out the document_id, document type (in pre ES 6.x era).

on the Logstash side (by making sure request never mix values of the relevant meta fields).

What we advise users when you can't mix the events in the same bulk request is then to use multiple ES outputs, but here we want the multiplexing to be done seamlessly.
Right now the only auto slicing we perform on the batch of events received from inputs+filters is by size to avoid 413 from ES. To solve this issue we'd have to perform other slices before the size slice.
Do we foresee other bulk level criteria for which we'd like to pre-slice the bulk request or only this disable_auto_create_index parameter?

…create_index_param

bleskes · 2018-11-09T11:55:15Z

@jsvd Thanks for explaining. Understood. I think that's quite a change on the Logstash side so we explored options on the ES side. The suggestion it to add an auto_create_index flag to each DocWriteRequest (the interface Index, Delete and Update request shares). This will allow you to keep current semantics and specify the flag on each bulk line. There is a down side though - the size of the request will grow with 27 bytes per line (unless you use smile which I suspect will compress repetition, or use other compression). Is that OK?

jsvd · 2018-11-12T13:58:08Z

Thanks @bleskes that's great. Compression for requests is available on the ES output plugin but disabled by default.

To support this we can add a new configuration option called auto_create_index that either takes a boolean or the name of a field. For each event we either retrieve the value of the field in the event or just set it to true/false.

The suggestion it to add an auto_create_index flag to each DocWriteRequest (the interface Index, Delete and Update request shares).

So the create action will raise an error if auto_create_index is set?

vladimirdolzhenko · 2018-11-15T17:03:21Z

@jsvd create is IndexRequest on our side, so it has to handle auto_create_index in the same way

vladimirdolzhenko · 2018-11-16T13:20:33Z

we've discussed with @jsvd in another channel for the case when bulk request have requests for the same index but with different auto_create_index values :

ES does check and create missing indices before doing any operations from bulk request and bulk requests are executing concurrently: it is reasonably to eliminate that kind of inaccuracy. ~~The most straight-forward is to reject entire bulk request as malformed~~

updated: It falls into default behaviour if there is at least one request with no value for auto_create_index field for the same index. I.e. to avoid index creation all requests for the same index have to have "auto_create_index": false.

…dex` per in action meta data of bulk request

…create_index_param

elasticmachine · 2020-02-26T04:21:40Z

Pinging @elastic/es-core-features (:Core/Features/Indices APIs)

martijnvg · 2020-03-19T15:33:00Z

Closing this PR, because we think that there is better approach to achieve the same behaviour and because the PR is stale (we should have dealt with this earlier, but it fell though many cracks).

We discussed in a different context about overriding the action.auto_create_index feature for specific namespaces by adding a new updatable cluster setting (action.manual_create_index), which would accept a list of patterns. If an index name matches with one of these patterns then it would not auto-create an index (even if action.auto_create_index would be setup to create the index). We think that this approach is less intrusive than what is build here and at the same time flexible enough.

disable_auto_create_index parameter could block index auto creation

5a7d5d8

vladimirdolzhenko added >enhancement :Data Management/Indices APIs APIs to create and manage indices and templates labels Oct 23, 2018

vladimirdolzhenko self-assigned this Oct 23, 2018

Merge remote-tracking branch 'remotes/origin/master' into allow_auto_…

ef7c00f

…create_index_param

vladimirdolzhenko force-pushed the allow_auto_create_index_param branch from ffc6f0b to ef7c00f Compare October 23, 2018 14:37

Vladimir Dolzhenko added 2 commits October 25, 2018 12:16

Merge branch 'origin/master' into allow_auto_create_index_param

2f5578f

do request parameter check after node settings check

686daf3

vladimirdolzhenko requested a review from bleskes October 29, 2018 09:58

Merge branch 'origin/master' into allow_auto_create_index_param

ebf7285

Merge remote-tracking branch 'remotes/origin/master' into allow_auto_…

221d95e

…create_index_param

Vladimir Dolzhenko added 4 commits November 8, 2018 20:15

use ?auto_create_index=false, it could not be set to true

3249d2f

fix rest-api-specs

c40a6b9

use autoCreateIndexIfPermitted on java level

633ec90

use autoCreateIndexIfPermitted on java level

9f44892

revert back reindex related changes

d5e63e4

Vladimir Dolzhenko added 6 commits November 19, 2018 13:31

add DocWriteRequest#isAutoCreateIndexIfPermitted; add `auto_create_in…

33b70ea

…dex` per in action meta data of bulk request

Merge remote-tracking branch 'remotes/origin/master' into allow_auto_…

4deaf54

…create_index_param

fix bulk doc: there is no ?auto_create_index=false

35fc505

minor changes

d9e4239

index has to be created to demonstrate ?auto_create_index=false

d5e2b35

Merge remote-tracking branch 'remotes/origin/master' into allow_auto_…

43f13fe

…create_index_param

vladimirdolzhenko requested review from bleskes and removed request for bleskes November 20, 2018 12:54

jaymode mentioned this pull request Nov 29, 2018

Replace SecurityIndexManager with special client #36078

Open

michaelbaamonde added the team-discuss label Feb 28, 2019

rjernst added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Feb 26, 2020

martijnvg removed the team-discuss label Mar 19, 2020

martijnvg closed this Mar 19, 2020

Add request parameter to block index auto creation #34737

Add request parameter to block index auto creation #34737

Uh oh!

Conversation

vladimirdolzhenko commented Oct 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Oct 23, 2018

Uh oh!

vladimirdolzhenko commented Oct 23, 2018

Uh oh!

vladimirdolzhenko commented Oct 23, 2018

Uh oh!

jasontedor commented Oct 23, 2018

Uh oh!

vladimirdolzhenko commented Oct 23, 2018

Uh oh!

jasontedor commented Oct 23, 2018

Uh oh!

bleskes commented Oct 24, 2018

Uh oh!

bleskes commented Oct 24, 2018

Uh oh!

clintongormley commented Oct 29, 2018

Uh oh!

tomcallahan commented Nov 5, 2018

Uh oh!

jaymode commented Nov 5, 2018

Uh oh!

vladimirdolzhenko commented Nov 5, 2018

Uh oh!

jasontedor commented Nov 7, 2018

Uh oh!

tsg commented Nov 7, 2018

Uh oh!

vladimirdolzhenko commented Nov 7, 2018

Uh oh!

urso commented Nov 7, 2018

Uh oh!

bleskes commented Nov 7, 2018

Uh oh!

tsg commented Nov 7, 2018

Uh oh!

jsvd commented Nov 7, 2018

Uh oh!

bleskes commented Nov 9, 2018

Uh oh!

jsvd commented Nov 12, 2018

Uh oh!

vladimirdolzhenko commented Nov 15, 2018

Uh oh!

vladimirdolzhenko commented Nov 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Feb 26, 2020

Uh oh!

martijnvg commented Mar 19, 2020

Uh oh!

Uh oh!

vladimirdolzhenko commented Oct 23, 2018 •

edited

Loading

vladimirdolzhenko commented Nov 16, 2018 •

edited

Loading