Description
Hiya Shay
It turns out that the issue I was having earlier with NFS was a red herring. What seems to be happening is:
My process:
- i'm reindexing old_index to new_index
- i read 5000 docs from the old index, then create each one in the new index
- if there is an error, then i delete new_index
So:
-
the cluster gets busy, and a search for the next 5,000 docs results in this error:
select failed: No child processes
. -
This was triggering the cleanup in my script which deleted the index.
-
It appears the index has been deleted by one node, while another node is still trying to write snapshot info for the (now deleted) index, which results in these errors:
[14:48:09,948][WARN ][index.gateway ] [Nameless One][ia_object_1270046679][0] Failed to snapshot on close
org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException: [ia_object_1270046679][0] Failed to append snapshot translog into [/opt/elasticsearch/data/iAnnounce/ia_object_1270046679/0/translog/translog-3]
at org.elasticsearch.index.gateway.fs.FsIndexShardGateway.snapshot(FsIndexShardGateway.java:199)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.snapshot(IndexShardGatewayService.java:154)
at org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:350)
at org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:369)
at org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:150)
at org.elasticsearch.index.gateway.IndexShardGatewayService.close(IndexShardGatewayService.java:176)
at org.elasticsearch.index.service.InternalIndexService.deleteShard(InternalIndexService.java:244)
at org.elasticsearch.index.service.InternalIndexService.close(InternalIndexService.java:159)
at org.elasticsearch.indices.InternalIndicesService.deleteIndex(InternalIndicesService.java:208)
at org.elasticsearch.indices.InternalIndicesService.deleteIndex(InternalIndicesService.java:185)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.google.inject.internal.ConstructionContext$DelegatingInvocationHandler.invoke(ConstructionContext.java:108)
at $Proxy19.deleteIndex(Unknown Source)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:178)
at org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:193)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.FileNotFoundException: /opt/elasticsearch/data/iAnnounce/ia_object_1270046679/0/translog/translog-3 (Stale NFS file handle)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at org.elasticsearch.index.gateway.fs.FsIndexShardGateway.snapshot(FsIndexShardGateway.java:184)
... 20 more
Now, I'm catching the select failed: No child processes
errors, sleeping for a few seconds, then trying again, and everything is working well.
ta
clint