Skip to content

Snapshot repository registration api call causing "out of memory" errors #10344

Closed
@nicktgr15

Description

@nicktgr15

We are running an elasticsearch cluster with 27 nodes and we create about 200 new indices per day.
We keep data for the last 5 days, so in total we have about 1000 indices. Every day we take a snapshot of yesterday's 200 indices in S3 (so, no incremental backups).

After updating to 1.4.2 we've noticed that when we try to register a snapshots repository we end up with the master node running out of heap space and the cluster going into an unresponsive state. I'm attaching a screenshot where you can see the heap space usage on the master node after making a PUT request to register a snapshots repository.

jmx-master

After inspecting a heap dump taken from the master node we realised that it's trying to list the contents of our s3 repository. At the moment we keep all previous snapshots in our s3 repository which means that it's impossible (in terms of time and resources) to list everything. I'm attaching a screenshot from Memory Analyser where you can see that 51% of the heap space (800 MB) is occupied by a Map storing 5.000.000 entries with PlainBlobMetadata objects as values and s3 locations as keys.

memoryanalyser_tree

We recently updated to 1.4.2. (from 1.1.2) and we don't think we've seen a similar behaviour (i.e. listing s3 repository contents) in 1.1.2. Could be an issue that needs further investigation on your side or could be the way we are using the snapshots service at the moment that is causing us problems?

Any suggestions/feedback would be welcome.

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions