Skip to content

Reduce number of writes needed for metadata updates #48701

Closed
@DaveCTurner

Description

@DaveCTurner

Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding.

We plan to change the format of on-disk metadata to reduce the number of writes needed during metadata updates. One option is a monolithic file containing the complete metadata, but this is inefficient in the common case that the metadata is mostly unchanged. Another option is to keep an append-only log of changes, but such a log must be compacted and this introduces quite some complexity. However we already have access to a very good storage mechanism that has the right kinds of properties: Lucene! We will use a dedicated Lucene index on each master-eligible node and replace each individual file with a document in this index. Most metadata updates will need only a few writes, and Lucene's background merging will take care of compaction.

On master-ineligible nodes we can keep the existing format and still reduce the writes required, because we can make better use of the fact that master-ineligible nodes only write committed metadata and therefore the version numbers are trustworthy. It may also be possible to avoid writing index metadata during cluster state application entirely and defer it until later.

Later:

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions