Skip to content

Reduce duplication when writing #607

Closed
@bboreham

Description

@bboreham

We replicate writes to N (typically 3) ingesters to reduce data loss from random failures of machine or process. However this means in stable operation each metric is written to store three times, along with index entries for each label (at least in the DynamoDB back-end). Say 10 index entries per chunk, for a total of 33 writes. Nearly one third of these writes are pointless and, worse, they slow down queries because all those entries get re-fetched and finally deduplicated.

One alternative idea:
When an ingester is about to write a chunk, query the index to see if an overlapping chunk already exists. If so, subtract the overlapping part and merge the remainder back into the chunks being accumulated. This would replace the 33 writes with 11 writes and 2 reads.

This idea doesn't require any particular consistency from the store - if it under-reports we'll just store some pointless chunks as before.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions