Skip to content

elasticsearch-shard remove-corrupted-data doesn't work on missing metadata #47435

Closed
@ct0br0

Description

@ct0br0

elasticsearch-shard appears to be the tool for removing corrupted metadata.
This has happened several times to us after updating past 7.0.0

Issue: directory structure and files are either deleted or never created, and elasticsearch-shard (remove-corrupted-data) can not remove it from the metadata

Steps:
recreate directory structure (as elasticsearch-shard errors out "directory must exist" if it does not)
run elasticsearch-shard (hits null pointer exception, because only directories exist)

/usr/share/elasticsearch/bin $ ./elasticsearch-shard remove-corrupted-data --index dce_rpc-2019.08.28 --shard-id 24 -d /data/nsm/elasticsearch/nodes/0/indices/TUa5c332RFGKmM6yZSK-Rw/0/index
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2

WARNING: Elasticsearch MUST be stopped before running this tool.

Please make a complete backup of your index before using this tool.


Exception in thread "main" java.lang.NullPointerException
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.findAndProcessShardPath(RemoveCorruptedShardDataCommand.java:152)
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.execute(RemoveCorruptedShardDataCommand.java:282)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:77)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.Command.main(Command.java:90)
at org.elasticsearch.index.shard.ShardToolCli.main(ShardToolCli.java:35)

/usr/share/elasticsearch/bin $ ls /data/nsm/elasticsearch/nodes/0/indices/TUa5c332RFGKmM6yZSK-Rw/0/
index _state translog

What I'd expect:
Kill the shard and not have to rm -rf the entire node and rely on replicas.

Hopefully there's an error in my steps.

elastic 7.3.0 (no plugins)
oracle linux 7.6
network drives (vSAN) for elastic storage (though this happens on physical boxes with docker containers too)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions