Skip to content

SLM metadata records enormous string describing last failure #71325

Open
@DaveCTurner

Description

@DaveCTurner

Elasticsearch version (bin/elasticsearch --version): Reported in 7.11.2 but still in master.

Plugins installed: Cloud

JVM version (java -version): N/A

OS version (uname -a if on a Unix-like system): Cloud

Description of the problem including expected versus actual behavior:

A user reported a multi-megabyte response to the Get snapshot lifecycle policy API, which they found surprising. The bulk of the content was reporting in great detail every single shard failure complete with stack trace, which are all included as suppressed exceptions here ...

snapInfo.shardFailures().forEach(failure -> e.addSuppressed(failure.getCause()));

... and converted to a string here:

newPolicyMetadata.setLastFailure(new SnapshotInvocationRecord(snapshotName, timestamp, exceptionToString()));

I'm not sure all this detail is useful, and it certainly seems bad to keep something so large in the cluster state. Can we trim this down somehow?

Provide logs (if relevant):

"details": "{\"type\":\"snapshot_exception\",\"reason\":\"[found-snapshots:cloud-snapshot-2021.04.01-REDACTED] failed to create snapshot successfully, REDACTED(>800) out of REDACTED(>800) total shards failed\",\"stack_trace\":\"SnapshotException[[found-snapshots:cloud-snapshot-2021.04.01-REDACTED] failed to create snapshot successfully, REDACTED(>800) out of REDACTED(>800) total shards failed]\\n\\tat org.elasticsearch.xpack.slm.SnapshotLifecycleTask$1.onResponse(SnapshotLifecycleTask.java:111)\\n\\tat org.elasticsearch.xpack.slm.SnapshotLifecycleTask$1.onResponse(SnapshotLifecycleTask.java:93)\\n\\tat org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)\\n\\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)\\n\\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:77)\\n\\tat org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)\\n\\tat org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:143)\\n\\tat org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:76)\\n\\tat org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:216)\\n\\tat org.elasticsearch.snapshots.SnapshotsService.completeListenersIgnoringException(SnapshotsService.java:2681)\\n\\tat org.elasticsearch.snapshots.SnapshotsService.lambda$finalizeSnapshotEntry$34(SnapshotsService.java:1577)\\n\\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:117)\\n\\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$finalizeSnapshot$37(BlobStoreRepository.java:1130)\\n\\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:117)\\n\\tat org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)\\n\\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)\\n\\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)\\n\\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)\\n\\tat java.base/java.lang.Thread.run(Thread.java:832)\\n\\tSuppressed: [REDACTED/REDACTED][[REDACTED][0]] IndexShardSnapshotFailedException[NoSuchFileException[Blob object [snapshots/REDACTED/indices/REDACTED/0/index-REDACTED] not found: 404 Not Found\\nGET https://storage.googleapis.com/download/storage/v1/b/REDACTED/o/snapshots%REDACTED%2Findices%REDACTED%2F0%2Findex-REDACTED?alt=media\\nNo such object: REDACTED/snapshots/REDACTED/indices/REDACTED/0/index-REDACTED]]\\n\\t\\tat org.elasticsearch.snapshots.SnapshotShardFailure.<init>(SnapshotShardFailure.java:66)\\n\\t\\tat org.elasticsearch.snapshots.SnapshotShardFailure.<init>(SnapshotShardFailure.java:54)\\n\\t\\tat org.elasticsearch.snapshots.SnapshotsService.finalizeSnapshotEntry(SnapshotsService.java:1524)\\n\\t\\tat org.elasticsearch.snapshots.SnapshotsService.access$2100(SnapshotsService.java:115)\\n\\t\\tat org.elasticsearch.snapshots.SnapshotsService$7.onResponse(SnapshotsService.java:1472)\\n\\t\\tat org.elasticsearch.snapshots.SnapshotsService$7.onResponse(SnapshotsService.java:1469)\\n\\t\\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository.doGetRepositoryData(BlobStoreRepository.java:1463)\\n\\t\\t... 6 more\\n\\tSuppressed: [REDACTED/REDACTED][[REDACTED][0]] IndexShardSnapshotFailedException[NoSuchFileException... [many MBs of the same snipped]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions