Feat/tsdb/close stale tsdb #1958

thorfour · 2020-01-06T17:26:08Z

What this PR does: Closes and removes TSDB's that haven't been written to in a given time period.

Which issue(s) this PR fixes: This cleans up disk and memory footprint for open TSDB's that no longer are written to.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pracucci

I see the use case for this feature (which I agree on), but I think the current implementation may lead to data loss (see comment). I've also left few minor comments to help improving the readability of code.

pkg/ingester/transfer.go

pkg/ingester/ingester_v2.go

pracucci

Good job @thorfour ! I believe this feature is a nice to have but, at the same time, the implementation is still a bit complex (intrinsically complex).

As mentioned offline, an option to simplify it may be changing a bit the meaning of the config option. What if we allow to set for how long the TSDB head should be empty before the TSDB is released? It would not allow to release a TSDB before the block range period + threshold, BUT will remove the snapshotting at all because if the head is empty there's nothing to snapshotting and the code would simplify.

pkg/ingester/ingester_v2.go

pkg/ingester/transfer.go

Signed-off-by: Thor <thansen@digitalocean.com>

codesome · 2020-01-16T07:22:18Z

Related to this: prometheus/prometheus#6637

pracucci · 2020-01-16T07:51:15Z

I've just merged the PR #1982. Please be aware we should now decrease the memUsers metric whenever a TSDB is removed.

codesome · 2020-01-16T09:33:43Z

@thorfour: I just discussed with @pracucci offline about TSDB not compacting the last remaining data from Head after ingestion stops and possible solution in cortex (See prometheus/prometheus#6637 as to why it won't be inbuilt in TSDB).

Here is what we have come up with as a solution (for TSDB which did not receive any sample for past X duration):

Reject appends on TSDB (blocked on the Cortex side)
Snapshot Head (I will add this to TSDB if this flow makes sense)
Move the new block to original data dir
Truncate Head (to remove data from memory) and reload the blocks.
Wait until the shipper has shipped all blocks to the storage (including the new one)
Close TSDB
Delete data from disk

WDYT?

pracucci · 2020-01-16T13:07:30Z

@thorfour @codesome Another option, probably easier but less efficient, would be done like Thanos (see code here): close TSDB, open it in read only, call FlushWAL(). Given having stale TSDBs shouldn't be the common case, we may trade efficiency with simplicity and follow Thanos' approach?

codesome · 2020-01-16T13:26:49Z

In a case where user has stopped sending data but is still querying, it would lead to some downtime (EDIT: not downtime, but gaps) for the user depending on how long the WAL replay+flush takes. (Maybe we can take care of that if we actually see that happening often in real-world)

pracucci · 2020-01-16T13:36:39Z

In a case where user has stopped sending data but is still querying, it would lead to some downtime (EDIT: not downtime, but gaps) for the user depending on how long the WAL replay+flush takes. (Maybe we can take care of that if we actually see that happening often in real-world)

True, but re-thinking it maybe it's an edge case we could accept as far as we document it. Moreover all this logic should be able to disable setting the option to 0 (and maybe it could be a good thing having it disabled by default?).

thorfour · 2020-01-16T14:54:04Z

@thorfour: I just discussed with @pracucci offline about TSDB not compacting the last remaining data from Head after ingestion stops and possible solution in cortex (See prometheus/prometheus#6637 as to why it won't be inbuilt in TSDB).

Here is what we have come up with as a solution (for TSDB which did not receive any sample for past X duration):
1. Reject appends on TSDB (blocked on the Cortex side)

2. Snapshot Head (I will add this to TSDB if this flow makes sense)

3. Move the new block to original data dir

4. Truncate Head (to remove data from memory) and reload the blocks.

5. Wait until the shipper has shipped all blocks to the storage (including the new one)

6. Close TSDB

7. Delete data from disk
WDYT?

This is similar to what I originally had implemented. I believe it works, but it's unfortunately complex code especially in the failure cases. We also have to go back to using a timestamp on writes to determine how long since a TSDB was last written which is unfortunate.

thorfour · 2020-02-10T19:29:58Z

@codesome what if we added an optional parameter to the Close function in TSDB or maybe a new function like CloseWithCompaction that cuts a new block from the head while closing?

codesome · 2020-02-13T10:37:03Z

@thorfour Maybe we can do the following which would not require this hack in place for upstream Prometheus: Refactor the db.Compact() into CompactHead() and CompactBlocks() (while Compact() calling these 2 methods). With that you can call CompactHead() in cortex and close the TSDB. Would that work for you?

pracucci · 2020-02-13T10:44:54Z

Maybe we can do the following which would not require this hack in place for upstream Prometheus: Refactor the db.Compact() into CompactHead() and CompactBlocks() (while Compact() calling these 2 methods). With that you can call CompactHead() in cortex and close the TSDB. Would that work for you?

@codesome As far as CompactHead() will check db.head.compactable() it will not work. The Compact() function is fine for us, but we would need a way to bypass db.head.compactable(): is there anything else we may need, that comes to your mind, to compact the entire head to a block?

codesome · 2020-02-13T10:48:55Z

As far as CompactHead() will check db.head.compactable() it will not work.

Yes I am aware of that. The plan is to have that check in Compact() while CompactHead() would take a rangeHead (which I will expose) and compact it without any checks.

pracucci · 2020-02-13T10:52:49Z

As far as CompactHead() will check db.head.compactable() it will not work.

Yes I am aware of that. The plan is to have that check in Compact() while CompactHead() would take a rangeHead (which I will expose) and compact it without any checks.

Right. Then I think your proposed solution may work.

thorfour · 2020-02-13T14:57:25Z

@codesome as long is it bypasses the head compatible check I think that solution would work!

codesome · 2020-02-14T09:55:38Z

PR for breaking the Compact() method into parts prometheus/prometheus#6820

stale · 2020-04-14T10:42:57Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

thorfour · 2020-04-14T14:31:40Z

While this feature is still needed, we need Thanos to vendor the latest Prometheus changes mentioned above, and then we need to vendor Thanos. Closing until these things are ready.

thorfour force-pushed the feat/tsdb/close-stale-tsdb branch 2 times, most recently from 4c6bcd2 to 55ef86c Compare January 6, 2020 19:18

pracucci reviewed Jan 7, 2020

View reviewed changes

thorfour force-pushed the feat/tsdb/close-stale-tsdb branch 5 times, most recently from 57394c5 to 365f01c Compare January 8, 2020 17:29

thorfour mentioned this pull request Jan 10, 2020

limits: distributor user subrings #1947

Merged

3 tasks

thorfour force-pushed the feat/tsdb/close-stale-tsdb branch from 365f01c to b7c9b29 Compare January 14, 2020 14:47

pracucci reviewed Jan 15, 2020

View reviewed changes

ingester: cleanup stale user TSDB's

7897cb6

Signed-off-by: Thor <thansen@digitalocean.com>

thorfour force-pushed the feat/tsdb/close-stale-tsdb branch from b7c9b29 to 7897cb6 Compare January 15, 2020 20:05

pracucci mentioned this pull request Jan 23, 2020

Limit max number of concurrent tenants shipping and synchronizing TSDB blocks #2026

Merged

3 tasks

stale bot added the stale label Apr 14, 2020

thorfour closed this Apr 14, 2020

Feat/tsdb/close stale tsdb #1958

Feat/tsdb/close stale tsdb #1958

Uh oh!

Conversation

thorfour commented Jan 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codesome commented Jan 16, 2020

Uh oh!

pracucci commented Jan 16, 2020

Uh oh!

codesome commented Jan 16, 2020

Uh oh!

pracucci commented Jan 16, 2020

Uh oh!

codesome commented Jan 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci commented Jan 16, 2020

Uh oh!

thorfour commented Jan 16, 2020

Uh oh!

thorfour commented Feb 10, 2020

Uh oh!

codesome commented Feb 13, 2020

Uh oh!

pracucci commented Feb 13, 2020

Uh oh!

codesome commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci commented Feb 13, 2020

Uh oh!

thorfour commented Feb 13, 2020

Uh oh!

codesome commented Feb 14, 2020

Uh oh!

stale bot commented Apr 14, 2020

Uh oh!

thorfour commented Apr 14, 2020

Uh oh!

Uh oh!

thorfour commented Jan 6, 2020 •

edited

Loading

codesome commented Jan 16, 2020 •

edited

Loading

codesome commented Feb 13, 2020 •

edited

Loading