-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Snapshot restoration fails due to CorruptIndexException #5337
Comments
Hello ! Do you think that the data in the snapshots is recoverable or did I lost the data ? |
Is there a way to recover the data stored in the snapshot ? |
🆙 |
Sorry it took so long to reply here. It does look like a data corruption problem, but let's not exclude bugs. I think it needs narrowing down. I would try a few things:
|
Hello @dblock ! Sorry for the delay.
|
So the issue is that these snapshots are created corrupted. I would keep narrowing down to what causes this, but not sure what to suggest next, maybe @andrross will have an idea? Is upgrading to the latest 1.x or even 2.x and seeing if this is still a problem an option? |
checksums are performed during allocation and snapshot restore. so the corruption could be on disk, then after the next snapshot is successful the snapshot is corrupted as well. we had an issue where we had an old snapshot that was recoverable but the new ones were all corrupted. could not replicate the issue either. We have yet to figure this one out as it is rare and hard to replicate. we tried this doc but it is not the same issue, maybe it can help your issue though @PaulLesur: |
Describe the bug
Indices cannot be restored because of a 1 byte difference in the snapshot.
To Reproduce
Expected behavior
The snapshot should be restored properly.
Host/Environment (please complete the following information):
Here are the information about our cluster:
Additional context
We use opensearch to store our logs. Our Opensearch cluster runs on Kubernetes using the official docker image.
We have a cronjob that snapshots indices everyday to an Azure Blob Storage and deletes the indices older than 5 days. When we want to access old logs data we restore the snapshots, but some of the indices in these snapshots fail to be restored.
This issue is random and can affect one snapshot over 20...
For example, an indice named backend-api-20220608:
The indice health is red :
The logs in Opensearch :
The error seems to be here :
writtenLength=3300529277 expectedLength=3300529276
What could cause this difference ? Is it possible to restore the indice without the corrupted data ? Is our data lost or is there a solution to recover it ?
The text was updated successfully, but these errors were encountered: