Reduce duplication when writing

We replicate writes to N (typically 3) ingesters to reduce data loss from random failures of machine or process. However this means in stable operation each metric is written to store three times, along with index entries for each label (at least in the DynamoDB back-end).  Say 10 index entries per chunk, for a total of 33 writes.  Nearly one third of these writes are pointless and, worse, they slow down queries because all those entries get re-fetched and finally deduplicated.

One alternative idea:
When an ingester is about to write a chunk, query the index to see if an overlapping chunk already exists. If so, subtract the overlapping part and merge the remainder back into the chunks being accumulated.  This would replace the 33 writes with 11 writes and 2 reads.

This idea doesn't require any particular consistency from the store - if it under-reports we'll just store some pointless chunks as before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce duplication when writing #607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce duplication when writing #607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions