Skip to content

GCS repository snapshot fails intermittently on some shards "Failed to check if blob exists" java.io.IOException: insufficient data written  #26636

Closed
@hoffoo

Description

@hoffoo

Elasticsearch version (bin/elasticsearch --version): 5.5.1

Plugins installed: [repository-gcs discovery-gce]

JVM version (java -version): 1.8.0_131

OS version (uname -a if on a Unix-like system): Linux XXX 4.10.0-27-generic #30~16.04.2-Ubuntu SMP Thu Jun 29 16:07:46 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Creating a snapshot fails on certain shards. Retrying a new snapshot works. For me it seems to fail on about 10% of shards (testing with 51 shards, 4 failed last test, 2 when I retried, finally 0 on the third try)

The exception is IndexShardSnapshotFailedException[BlobStoreException[Failed to check if blob [__79.part4] exists]; nested: SocketTimeoutException[Read timed out];]; nested: BlobStoreException[Failed to check if blob [__79.part4] exists]; nested: SocketTimeoutException[Read timed out];

This is using gcs coldstorage.

I see that there are further options i can give the plugin, mainly http.connect_timeout and http.read_timeout, but im not sure if they are relevant for the exception below: java.io.IOException: insufficient data written

I wouldn't mind this failing if I could detect it and retry. Could I do this by deleting the snapshot and recreating it? From what I understand the successfully backed up shards will not be deleted if I did this?

Steps to reproduce:

  1. Create a gcs snapshot with these settings {"gcs":{"type":"gcs","settings":{"bucket":"XXXX","compress":"true"}}}

Provide logs (if relevant):

org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to perform snapshot (index files)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1377) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:972) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:382) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:88) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:335) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.1.jar:5.5.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.io.IOException: insufficient data written
	at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3540) ~[?:?]
	at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81) ~[?:?]
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972) ~[?:?]
	at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545) ~[?:?]
	at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:417) ~[?:?]
	at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) ~[?:?]
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427) ~[?:?]
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) ~[?:?]```

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions