Skip to content

A Node Joining a Cluster with a Large State Receives the Full Uncompressed State in a ValidateJoinRequest #83204

Closed
@original-brownbear

Description

@original-brownbear

The ValidateJoinRequest contains the cluster state uncompressed.
This causes problems once the cluster state reaches a certain size. For one it requires a massive amount of memory even after #82608 but also, reading the full state on the transport thread outright (unlike with the publication handler that deserializes on GENERIC) is too slow.

For a 40k indices cluster with beats mappings and an admittedly large number of data streams this is what happens:

[2022-01-27T11:35:37,960][WARN ][o.e.t.InboundHandler     ] [elasticsearch-2] handling request [InboundMessage{Header{554386564}{8.1.0}{1239565}{true}{false}{false}{false}{internal:cluster/coordination/join/validate}}] took [7208ms] which is above the warn threshold of [5000ms]

We receive and deserialise a 500M+ message on the transport thread.

This becomes troublesome due the heap required just to buffer the message on a fresh master node that might otherwise be capable of handling this kind of cluster state (it's smaller on heap due to setting+mapping deduplication).

The slowness on the transport thread can mostly be blamed on the time it takes to read index settings.

image

This relates #80493 and setting deduplication in general. Ideally we should find a way of deduplicating the settings better to make the message smaller. Until that time a reasonable solution might be to simply compress the state in the message and read it as plain bytes, then deserialise on GENERIC like we do for the publication handler.

An additional issue with this is that the master/sending node has to serialize this message in full which puts a problematic amount of strain on it potentially.

Metadata

Metadata

Assignees

Labels

:Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.>bugTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions