-
Notifications
You must be signed in to change notification settings - Fork 820
Feat/tsdb/close stale tsdb #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4c6bcd2
to
55ef86c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the use case for this feature (which I agree on), but I think the current implementation may lead to data loss (see comment). I've also left few minor comments to help improving the readability of code.
57394c5
to
365f01c
Compare
365f01c
to
b7c9b29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @thorfour ! I believe this feature is a nice to have but, at the same time, the implementation is still a bit complex (intrinsically complex).
As mentioned offline, an option to simplify it may be changing a bit the meaning of the config option. What if we allow to set for how long the TSDB head should be empty before the TSDB is released? It would not allow to release a TSDB before the block range period + threshold, BUT will remove the snapshotting at all because if the head is empty there's nothing to snapshotting and the code would simplify.
Signed-off-by: Thor <thansen@digitalocean.com>
b7c9b29
to
7897cb6
Compare
Related to this: prometheus/prometheus#6637 |
I've just merged the PR #1982. Please be aware we should now decrease the |
@thorfour: I just discussed with @pracucci offline about TSDB not compacting the last remaining data from Head after ingestion stops and possible solution in cortex (See prometheus/prometheus#6637 as to why it won't be inbuilt in TSDB). Here is what we have come up with as a solution (for TSDB which did not receive any sample for past X duration):
WDYT? |
@thorfour @codesome Another option, probably easier but less efficient, would be done like Thanos (see code here): close TSDB, open it in read only, call |
In a case where user has stopped sending data but is still querying, it would lead to some downtime (EDIT: not downtime, but gaps) for the user depending on how long the WAL replay+flush takes. (Maybe we can take care of that if we actually see that happening often in real-world) |
True, but re-thinking it maybe it's an edge case we could accept as far as we document it. Moreover all this logic should be able to disable setting the option to |
This is similar to what I originally had implemented. I believe it works, but it's unfortunately complex code especially in the failure cases. We also have to go back to using a timestamp on writes to determine how long since a TSDB was last written which is unfortunate. |
@codesome what if we added an optional parameter to the |
@thorfour Maybe we can do the following which would not require this hack in place for upstream Prometheus: Refactor the |
@codesome As far as |
Yes I am aware of that. The plan is to have that check in |
Right. Then I think your proposed solution may work. |
@codesome as long is it bypasses the head compatible check I think that solution would work! |
PR for breaking the |
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
While this feature is still needed, we need Thanos to vendor the latest Prometheus changes mentioned above, and then we need to vendor Thanos. Closing until these things are ready. |
What this PR does: Closes and removes TSDB's that haven't been written to in a given time period.
Which issue(s) this PR fixes: This cleans up disk and memory footprint for open TSDB's that no longer are written to.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]