Skip to content

Elasticsearch fails to start with error: "Failed to find metadata for index" on every restart #47276

Closed
@redbaron4

Description

@redbaron4

Elasticsearch version: 7.3.2

Plugins installed: []

JVM version (java -version): 1.8.0

OS version: Centos-7.4

We have been running an elasticsearch cluster consisting of 5 modes for quite some time now. After upgrade to v7, we have noticed a lot of times our nodes refuse to start with
an error nested: IOException[failed to find metadata for existing index XXX.

The first time I encountered this error, I searched the discuss board and found this which talks of stronger startup checks enforced by ES-7.x and points to data directory getting corrupted due to external factors. Thinking it may be the same probloem, I duly took the node offline and ran a disk check which reported no errors. So I deleted the data directory, started the node and that was that.

However, the next time I did a rolling upgrade of my cluster, a different node failed with a similar error (The index name was different). I followed the same emergency procedure (delete data directory and restart node) and cluster was fixed.

Now after every rolling upgrade I seem to run into this error with atleast one of my node. The index name always points to a closed index. The error occurs only on restart (never while elasticsearch is running).

I find it hard to believe that all 5 of my nodes have a disk problem because:

  • I have run fsck everytime this error has occurred and no errors have been reported.
  • Elasticsearch runs without a problem for days on end (A disk error or other programs corrupting the data would cause running elasticsearch to crash as had happended on one of my nodes about a year back).

Yesterday we had a power issue at the data-center which led to all nodes getting power cycled. Upon restart 4 out of 5 modes failed to start with same errors. On all 4 nodes, the names of indexes was different (The indexes in question were "closed"). I had no option but to delete all data on those 4 nodes (Thus losing about 80% of elasticsearch data).

The errors seen were

[2019-09-30T10:36:58,205][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [esnode3] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: IOException[failed to find metadata for existing index ssl-2019.09.20 [location: GmslGWkHTLGQowmMHFut7A, generation: 11]];
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-7.3.2.jar:7.3.2]
        at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.3.2.jar:7.3.2]
Caused by: org.elasticsearch.ElasticsearchException: failed to bind service
        at org.elasticsearch.node.Node.<init>(Node.java:617) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:258) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.3.2.jar:7.3.2]
        ... 6 more
Caused by: java.io.IOException: failed to find metadata for existing index ssl-2019.09.20 [location: GmslGWkHTLGQowmMHFut7A, generation: 11]
        at org.elasticsearch.gateway.MetaStateService.loadFullState(MetaStateService.java:99) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.gateway.GatewayMetaState.upgradeMetaData(GatewayMetaState.java:141) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:95) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:492) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:258) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.3.2.jar:7.3.2]
        ... 6 more
[2019-09-30T10:36:58,210][INFO ][o.e.x.m.p.NativeController] [esnode3] Native controller process has stopped - no new native processes can be started

and

[2019-09-30T10:39:59,737][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [esnode2] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: IOException[failed to find metadata for existing index dns-2019.09.22 [location: ZMenLry9Qxe5-2-XNrWj2A, generation: 15]];
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-7.3.2.jar:7.3.2]
        at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.3.2.jar:7.3.2]
Caused by: org.elasticsearch.ElasticsearchException: failed to bind service
        at org.elasticsearch.node.Node.<init>(Node.java:617) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:258) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.3.2.jar:7.3.2]
        ... 6 more
Caused by: java.io.IOException: failed to find metadata for existing index dns-2019.09.22 [location: ZMenLry9Qxe5-2-XNrWj2A, generation: 15]
        at org.elasticsearch.gateway.MetaStateService.loadFullState(MetaStateService.java:99) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.gateway.GatewayMetaState.upgradeMetaData(GatewayMetaState.java:141) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:95) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:492) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:258) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.3.2.jar:7.3.2]
        ... 6 more

Is it possible that data of closed indexes is not being persisted properly (leading to issues at restart)? Can this be mitigated somehow (Maybe rolling back to less stronger consistency checks)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions