Skip to content

Commit 7b4111e

Browse files
[docs] Resolving failed Kibana upgrade migrations (#80999) (#89295)
* Resolving failed Kibana upgrade migrations * Move warning against rolling upgrades into upgrade-standard and call out stopping all instances in specific upgrade steps * Add preventing migration failures section * Add incompatible xpack.tasks.index: .tasks setting to preventing migration failures * Fix link Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
1 parent 00d0541 commit 7b4111e

File tree

3 files changed

+116
-40
lines changed

3 files changed

+116
-40
lines changed

docs/setup/upgrade.asciidoc

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,24 @@
44
Depending on the {kib} version you're upgrading from, the upgrade process to 7.0
55
varies.
66

7-
NOTE: {kib} upgrades automatically when starting a new version, as described in
8-
<<upgrade-migrations, this document>>.
9-
Although you do not need to manually back up {kib} before upgrading, we recommend
10-
that you have a backup on hand. You can use
11-
<<snapshot-repositories, Snapshot and Restore>> to back up {kib}
12-
data by targeting `.kibana*` indices. If you are using the Reporting plugin,
13-
you can also target `.reporting*` indices.
14-
157
[float]
168
[[upgrade-before-you-begin]]
179
=== Before you begin
1810

11+
WARNING: {kib} automatically runs upgrade migrations when required. To roll back to an earlier version in case of an upgrade failure, you **must** have a backup snapshot available. Use <<snapshot-repositories, Snapshot and Restore>> to back up {kib} data by targeting the `.kibana*` indices. For more information see <<upgrade-migrations, upgrade migrations>>.
12+
1913
Before you upgrade {kib}:
2014

2115
* Consult the <<breaking-changes,breaking changes>>.
16+
* Back up your data with <<snapshot-repositories, Snapshot and Restore>>. To roll back to an earlier version, you **must** have a snapshot of the `.kibana*` indices.
17+
* Although not a requirement for rollbacks, we recommend taking a snapshot of all {kib} indices created by the plugins you use such as the `.reporting*` indices created by the reporting plugin.
2218
* Before you upgrade production servers, test the upgrades in a dev environment.
23-
* Back up your data with {es} {ref}/modules-snapshots.html[snapshots].
24-
To roll back to an earlier version, you **must** have a backup of your data.
19+
* See <<preventing-migration-failures, preventing migration failures>> for common reasons upgrades fail and how to prevent these.
2520
* If you are using custom plugins, check that a compatible version is
2621
available.
27-
* Shut down all {kib} nodes. Running more than one {kib} version against the
28-
same Elasticseach index is unsupported. If you upgrade while older {kib} nodes are
29-
running, the upgrade can fail.
22+
* Shut down all {kib} instances. Running more than one {kib} version against
23+
the same Elasticseach index is unsupported. Upgrading while older {kib}
24+
instances are running can cause data loss or upgrade failures.
3025

3126
To identify the changes you need to make to upgrade, and to enable you to
3227
perform an Elasticsearch rolling upgrade with no downtime, you must upgrade to
Lines changed: 96 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,127 @@
11
[[upgrade-migrations]]
2-
=== Migrate saved objects
2+
=== Upgrade migrations
33

4-
Every time {kib} is upgraded it checks to see if all saved objects, such as dashboards, visualizations, and index patterns, are compatible with the new version. If any objects need to be updated, then the automatic saved object migration process is kicked off.
4+
Every time {kib} is upgraded it checks to see if all saved objects, such as dashboards, visualizations, and index patterns, are compatible with the new version. If any saved objects need to be updated, then the automatic saved object migration process is kicked off.
55

66
NOTE: 6.7 includes an https://www.elastic.co/guide/en/kibana/6.7/upgrade-assistant.html[Upgrade Assistant]
77
to help you prepare for your upgrade to 7.0. To access the assistant, go to *Management > 7.0 Upgrade Assistant*.
88

9+
WARNING: The following instructions assumes {kib} is using the default index names. If the `kibana.index` or `xpack.tasks.index` configuration settings were changed these instructions will have to be adapted accordingly.
10+
911
[float]
1012
[[upgrade-migrations-process]]
11-
==== How the process works
13+
==== Background
1214

13-
Saved objects are stored in an index named `.kibana_N`, where `N` is a number that increments over time as {kib} is upgraded. The index alias `.kibana` points to the latest up-to-date index for a given install.
15+
Saved objects are stored in two indices:
1416

15-
NOTE: Prior to 6.5.0, saved objects were stored directly in an index named `.kibana`, so the first time you upgrade to {kib} version 6.5+, {kib} will migrate into `.kibana_1` and set `.kibana` up as an index alias.
17+
* `.kibana_N`, or if set, the `kibana.index` configuration setting
18+
* `.kibana_task_manager_N`, or if set, the `xpack.tasks.index` configuration setting
19+
20+
For each of these indices, `N` is a number that increments every time {kib} runs an upgrade migration on that index. The index aliases `.kibana` and `.kibana_task_manager` point to the most up-to-date index.
1621

1722
While {kib} is starting up and before serving any HTTP traffic, it checks to see if any internal mapping changes or data transformations for existing saved objects are required.
1823

19-
When changes are necessary, a new incremental `.kibana_N` index is created with updated mappings, then the saved objects are loaded in batches from the existing index, transformed to whatever extent necessary, and added to this new index.
24+
When changes are necessary, a new migration is started. To ensure that only one {kib} instance performs the migration, each instance will attempt to obtain a migration lock by creating a new `.kibana_N+1` index. The instance that succeeds in creating the index will then read batches of documents from the existing index, migrate them, and write them to the new index. Once the objects are migrated, the lock is released by pointing the `.kibana` index alias the new upgraded `.kibana_N+1` index.
25+
26+
Instances that failed to acquire a lock will log `Another Kibana instance appears to be migrating the index. Waiting for that migration to complete`. The instance will then wait until `.kibana` points to an upgraded index before starting up and serving HTTP traffic.
2027

21-
Once the objects are migrated, the `.kibana` index alias is updated to point to the new index, and {kib} finishes starting up and serving HTTP traffic.
28+
NOTE: Prior to 6.5.0, saved objects were stored directly in an index named `.kibana`. After upgrading to version 6.5+, {kib} will migrate this index into `.kibana_N` and set `.kibana` up as an index alias. +
29+
Prior to 7.4.0, task manager tasks were stored directly in an index name `.kibana_task_manager`. After upgrading to version 7.4+, {kib} will migrate this index into `.kibana_task_manager_N` and set `.kibana_task_manager` up as an index alias.
2230

2331
[float]
24-
[[upgrade-migrations-old-indices]]
25-
==== Handling old `.kibana` indices
32+
[[preventing-migration-failures]]
33+
==== Preventing migration failures
34+
This section highlights common causes of {kib} upgrade failures and how to prevent them.
35+
36+
[float]
37+
===== Corrupt saved objects
38+
We highly recommend testing your {kib} upgrade in a development cluster to discover and remedy problems caused by corrupt documents, especially when there are custom integrations creating saved objects in your environment. Saved objects that were corrupted through manual editing or integrations will cause migration failures with a log message like `Failed to transform document. Transform: index-pattern:7.0.0\n Doc: {...}` or `Unable to migrate the corrupt Saved Object document ...`. Corrupt documents will have to be fixed or deleted before an upgrade migration can succeed.
39+
40+
[float]
41+
===== User defined index templates that causes new `.kibana*` indices to have incompatible settings or mappings
42+
Matching index templates which specify `settings.refresh_interval` or `mappings` are known to interfere with {kib} upgrades.
43+
44+
Prevention: narrow down the index patterns of any user-defined index templates to ensure that these won't apply to new `.kibana*` indices.
2645

27-
After migrations have run, there will be multiple {kib} indices in {es}: (`.kibana_1`, `.kibana_2`, etc). {kib} only uses the index that the `.kibana` alias points to. The other {kib} indices can be safely deleted, but are left around as a matter of historical record, and to facilitate rolling {kib} back to a previous version.
46+
Note: {kib} < 6.5 creates it's own index template called `kibana_index_template:.kibana` and index pattern `.kibana`. This index template will not interfere and does not need to be changed or removed.
47+
48+
[float]
49+
===== An unhealthy {es} cluster
50+
Problems with your {es} cluster can prevent {kib} upgrades from succeeding. Ensure that your cluster has:
51+
52+
* enough free disk space, at least twice the amount of storage taken up by the `.kibana` and `.kibana_task_manager` indices
53+
* sufficient heap size
54+
* a "green" cluster status
55+
56+
[float]
57+
===== Running different versions of {kib} connected to the same {es} index
58+
Kibana does not support rolling upgrades. Stop all {kib} instances before starting a newer version to prevent upgrade failures and data loss.
59+
60+
[float]
61+
===== Incompatible `xpack.tasks.index` configuration setting
62+
For {kib} < 7.5.1, if the task manager index is set to `.tasks` with the configuration setting `xpack.tasks.index: ".tasks"`, upgrade migrations will fail. {kib} 7.5.1 and later prevents this by refusing to start with an incompatible configuration setting.
2863

2964
[float]
30-
[[upgrade-migrations-errors]]
31-
==== Handling errors during saved object migrations
65+
[[resolve-migrations-failures]]
66+
==== Resolving migration failures
3267

33-
If {kib} terminates unexpectedly while migrating a saved object index, some additional work may be required in order to get {kib} to re-attempt the migration.
68+
If {kib} terminates unexpectedly while migrating a saved object index, manual intervention is required before {kib} will attempt to perform the migration again. Follow the advice in (preventing migration failures)[preventing-migration-failures] before retrying a migration upgrade.
3469

35-
For example, if the `.kibana` alias is pointing to `.kibana_4`, and there is a `.kibana_5` index in {es}, the `.kibana_5` index will need to be deleted. {kib} will never attempt to overwrite an existing index.
70+
As mentioned above, {kib} will create a migration lock for each index that requires a migration by creating a new `.kibana_N+1` index. For example: if the `.kibana_task_manager` alias is pointing to `.kibana_task_manager_5` then the first {kib} that succeeds in creating `.kibana_task_manager_6` will obtain the lock to start migrations.
71+
72+
However, if the instance that obtained the lock fails to migrate the index, all other {kib} instances will be blocked from performing this migration. This includes the instance that originally obtained the lock, it will be blocked from retrying the migration even when restarted.
3673

3774
[float]
38-
[[upgrade-migrations-multiple-instances]]
39-
==== Support for multiple {kib} instances
75+
===== Retry a migration by restoring a backup snapshot:
76+
77+
1. Before proceeding ensure that you have a recent and successful backup snapshot of all `.kibana*` indices.
78+
2. Shutdown all {kib} instances to be 100% sure that there are no instances currently performing a migration.
79+
3. Delete all saved object indices with `DELETE /.kibana*`
80+
4. Restore the `.kibana* indices and their aliases from the backup snapshot. See {es} {ref}/modules-snapshots.html[snapshots]
81+
5. Start up all {kib} instances to retry the upgrade migration.
4082

41-
If you're running multiple {kib} instances for a single index behind a load balancer, it's important that you stop all instances before upgrading, so you do not have multiple different versions of {kib} trying to perform saved object migrations.
83+
[float]
84+
===== (Not recommended) Retry a migration without a backup snapshot:
4285

43-
The first instance that triggers saved object migrations will run the entire process. Any other instances started up while a migration is running will log a message and then wait until saved object migration has completed before they start serving HTTP traffic.
86+
1. Shutdown all {kib} instances to be 100% sure that there are no instances currently performing a migration.
87+
2. Identify any migration locks by comparing the output of `GET /_cat/aliases` and `GET /_cat/indices`. If e.g. `.kibana` is pointing to `.kibana_4` and there is a `.kibana_5` index, the `.kibana_5` index will act like a migration lock blocking further attempts. Be sure to check both the `.kibana` and `.kibana_task_manager` aliases and their indices.
88+
3. Remove any migration locks e.g. `DELETE /.kibana_5`.
89+
4. Start up all {kib} instances.
4490

4591
[float]
4692
[[upgrade-migrations-rolling-back]]
4793
==== Rolling back to a previous version of {kib}
4894

49-
When rolling {kib} back to a previous version, point the `.kibana` alias to
50-
the appropriate {kib} index. When you have the previous version running again,
51-
delete the more recent `.kibana_N` index or indices so that future upgrades are
52-
based on the current {kib} index. You must restart {kib} to re-trigger the migration.
95+
If you've followed the advice in (preventing migration failures)[preventing-migration-failures] and (resolving migration failures)[resolve-migrations-failures] and {kib} is still not able to upgrade successfully, you might choose to rollback {kib} until you're able to identify the root cause.
96+
97+
WARNING: Before rolling back {kib}, ensure that the version you wish to rollback to is compatible with your {es} cluster. If the version you're rolling back to is not compatible, you will have to also rollback {es}. +
98+
Any changes made after an upgrade will be lost when rolling back to a previous version.
99+
100+
In order to rollback after a failed upgrade migration, the saved object indices might also have to be rolled back to be compatible with the previous {kibana} version.
101+
102+
[float]
103+
===== Rollback by restoring a backup snapshot:
104+
105+
1. Before proceeding ensure that you have a recent and successful backup snapshot of all `.kibana*` indices.
106+
2. Shutdown all {kib} instances to be 100% sure that there are no instances currently performing a migration.
107+
3. Delete all saved object indices with `DELETE /.kibana*`
108+
4. Restore the `.kibana* indices and their aliases from the backup snapshot. See {es} {ref}/modules-snapshots.html[snapshots]
109+
5. Start up all {kib} instances on the older version you wish to rollback to.
110+
111+
[float]
112+
===== (Not recommended) Rollback without a backup snapshot:
113+
114+
WARNING: {kib} does not run a migration for every saved object index on every upgrade. A {kib} version upgrade can cause no migrations, migrate only the `.kibana` or the `.kibana_task_manager` index or both. Carefully read the logs to ensure that you're only deleting indices created by a later version of {kib} to avoid data loss.
115+
116+
1. Shutdown all {kib} instances to be 100% sure that there are no {kib} instances currently performing a migration.
117+
2. Create a backup snapshot of the `.kibana*` indices.
118+
3. Use the logs from the upgraded instances to identify which indices {kib} attempted to upgrade. The server logs will contain an entry like `[savedobjects-service] Creating index .kibana_4.` and/or `[savedobjects-service] Creating index .kibana_task_manager_2.` If no indices were created after upgrading {kib} then no further action is required to perform a rollback, skip ahead to step (5). If you're running multiple {kib} instances, be sure to inspect all instances' logs.
119+
4. Delete each of the indices identified in step (2). e.g. `DELETE /.kibana_task_manager_2`
120+
5. Inspect the output of `GET /_cat/aliases`. If either the `.kibana` and/or `.kibana_task_manager` alias is missing, these will have to be created manually. Find the latest index from the output of `GET /_cat/indices` and create the missing alias to point to the latest index. E.g. if the `.kibana` alias was missing and the latest index is `.kibana_3` create a new alias with `POST /.kibana_3/_aliases/.kibana`.
121+
6. Start up {kib} on the older version you wish to rollback to.
122+
123+
[float]
124+
[[upgrade-migrations-old-indices]]
125+
==== Handling old `.kibana_N` indices
53126

54-
WARNING: Rolling back to a previous {kib} version can result in saved object data loss if you had successfully upgraded and made changes to saved objects before rolling back.
127+
After migrations have completed, there will be multiple {kib} indices in {es}: (`.kibana_1`, `.kibana_2`, etc). {kib} only uses the index that the `.kibana` alias points to. The other {kib} indices can be safely deleted, but are left around as a matter of historical record, and to facilitate rolling {kib} back to a previous version.

docs/setup/upgrade/upgrade-standard.asciidoc

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,20 @@ If you've saved and/or exported objects in {kib} that rely on the
1212
necessary remediation steps as per those instructions.
1313
===========================================
1414

15+
[float]
16+
==== Upgrading multiple {kib} instances
17+
18+
WARNING: Kibana does not support rolling upgrades. If you're running multiple {kib} instances, all instances should be stopped before upgrading.
19+
20+
Different versions of {kib} running against the same {es} index, such as during a rolling upgrade, can cause upgrade migration failures and data loss. This is because acknowledged writes from the older instances could be written into the _old_ index while the migration is in progress. To prevent this from happening ensure that all old {kib} instances are shutdown before starting up instances on a newer version.
21+
22+
The first instance that triggers saved object migrations will run the entire process. Any other instances started up while a migration is running will log a message and then wait until saved object migrations has completed before they start serving HTTP traffic.
23+
1524
[float]
1625
==== Upgrade using a `deb` or `rpm` package
1726

1827
. Stop the existing {kib} process using the appropriate command for your
19-
system.
28+
system. If you have multiple {kib} instances connecting to the same {es} cluster ensure that all instances are stopped before proceeding to the next step to avoid data loss.
2029
. Use `rpm` or `dpkg` to install the new package. All files should be placed in
2130
their proper locations and config files should not be overwritten.
2231
+
@@ -43,8 +52,7 @@ otherwise {kib} will fail to start.
4352
don't overwrite the `config` or `data` directories. +
4453
+
4554
--
46-
IMPORTANT: If you use {monitor-features}, you must re-use the data directory when you
47-
upgrade {kib}. Otherwise, the {kib} instance is assigned a new persistent UUID
55+
IMPORTANT: If you use {monitor-features}, you must re-use the data directory when you upgrade {kib}. Otherwise, the {kib} instance is assigned a new persistent UUID
4856
and becomes a new instance in the monitoring data.
4957

5058
--
@@ -57,5 +65,5 @@ and becomes a new instance in the monitoring data.
5765
. Install the appropriate versions of all your plugins for your new
5866
installation using the `kibana-plugin` script. Check out the
5967
<<kibana-plugins,plugins>> documentation for more information.
60-
. Stop the old {kib} process.
68+
. Stop the old {kib} process. If you have multiple {kib} instances connecting to the same {es} cluster ensure that all instances are stopped before proceeding to the next step to avoid data loss.
6169
. Start the new {kib} process.

0 commit comments

Comments
 (0)