Skip to content
This repository has been archived by the owner on Nov 7, 2018. It is now read-only.

Data nodes failing to restart #239

Open
gcyre opened this issue Nov 5, 2018 · 0 comments
Open

Data nodes failing to restart #239

gcyre opened this issue Nov 5, 2018 · 0 comments

Comments

@gcyre
Copy link

gcyre commented Nov 5, 2018

Hello

I've followed the instructions to setup a ES cluster in our test k8s cluster and have run into a couple of issues with the data nodes.

The first issue is with creating the 2 data nodes, I'm not able to create 2 data pods on a single k8s worker node. In my es-data.yaml I've updated it to use a PVC on an external SAN. When the second pod comes up there's an error in the logs

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/data/data/myesdb]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.3.2.jar:6.3.2]
	at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.3.2.jar:6.3.2]
Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/data/data/myesdb]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
	at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:243) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.node.Node.<init>(Node.java:270) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.2.jar:6.3.2]

The second issue is when the pod dies and get restarted by kubernetes. When the pod comes up there are errors relating to lock files

[2018-11-05T21:01:32,426][WARN ][o.e.i.e.Engine           ] [es-data-6c487c6b77-nfqv9] [logstash-2018.10.28][2] could not lock IndexWriter
org.apache.lucene.store.LockObtainFailedException: Lock held by another program: /data/data/nodes/0/indices/3V6MF-EXTiOJ-yO93SVX8g/2/index/write.lock
	at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:130) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:104) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:948) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:1939) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:1930) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:191) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:157) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:2160) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:2142) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1349) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1304) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:420) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1575) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$5(IndexShard.java:2028) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:626) [elasticsearch-6.3.2.jar:6.3.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

Should I be creating the pods as Statefulsets?
Am I missing something in the configuration?

any help would be appreciated

thanks
Garry

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant