Description
elasticsearch-shard
appears to be the tool for removing corrupted metadata.
This has happened several times to us after updating past 7.0.0
Issue: directory structure and files are either deleted or never created, and elasticsearch-shard
(remove-corrupted-data) can not remove it from the metadata
Steps:
recreate directory structure (as elasticsearch-shard
errors out "directory must exist" if it does not)
run elasticsearch-shard
(hits null pointer exception, because only directories exist)
/usr/share/elasticsearch/bin $ ./elasticsearch-shard remove-corrupted-data --index dce_rpc-2019.08.28 --shard-id 24 -d /data/nsm/elasticsearch/nodes/0/indices/TUa5c332RFGKmM6yZSK-Rw/0/index
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2WARNING: Elasticsearch MUST be stopped before running this tool.
Please make a complete backup of your index before using this tool.
Exception in thread "main" java.lang.NullPointerException
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.findAndProcessShardPath(RemoveCorruptedShardDataCommand.java:152)
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.execute(RemoveCorruptedShardDataCommand.java:282)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:77)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.Command.main(Command.java:90)
at org.elasticsearch.index.shard.ShardToolCli.main(ShardToolCli.java:35)/usr/share/elasticsearch/bin $ ls /data/nsm/elasticsearch/nodes/0/indices/TUa5c332RFGKmM6yZSK-Rw/0/
index _state translog
What I'd expect:
Kill the shard and not have to rm -rf
the entire node and rely on replicas.
Hopefully there's an error in my steps.
elastic 7.3.0 (no plugins)
oracle linux 7.6
network drives (vSAN) for elastic storage (though this happens on physical boxes with docker containers too)