Skip to content

Commit

Permalink
Avoid indefinite checkpointing (#2955)
Browse files Browse the repository at this point in the history
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
  • Loading branch information
codesome and gouthamve authored Aug 4, 2020
1 parent badc146 commit e2f8663
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@
* [BUGFIX] Fixed `Missing chunks and index config causing silent failure` Absence of chunks and index from schema config is not validated. #2732
* [BUGFIX] Fix panic caused by KVs from boltdb being used beyond their life. #2971
* [BUGFIX] Experimental TSDB: `/api/v1/series`, `/api/v1/labels` and `/api/v1/label/{name}/values` only query the TSDB head regardless of the configured `-experimental.blocks-storage.tsdb.retention-period`. #2974
* [BUGFIX] Ingester: Avoid indefinite checkpointing in case of surge in number of series. #2955

## 1.2.0 / 2020-07-01

Expand Down
10 changes: 10 additions & 0 deletions pkg/ingester/wal.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ func (w *walWrapper) performCheckpoint(immediate bool) (err error) {
totalSize := 0
ticker := time.NewTicker(perSeriesDuration)
defer ticker.Stop()
start := time.Now()
for userID, state := range us {
for pair := range state.fpToSeries.iter() {
state.fpLocker.Lock(pair.fp)
Expand All @@ -361,6 +362,15 @@ func (w *walWrapper) performCheckpoint(immediate bool) (err error) {
}

if !immediate {
if time.Since(start) > 2*w.cfg.CheckpointDuration {
// This could indicate a surge in number of series and continuing with
// the old estimation of ticker can make checkpointing run indefinitely in worst case
// and disk running out of space. Re-adjust the ticker might not solve the problem
// as there can be another surge again. Hence let's checkpoint this one immediately.
immediate = true
continue
}

select {
case <-ticker.C:
case <-w.quit: // When we're trying to shutdown, finish the checkpoint as fast as possible.
Expand Down

0 comments on commit e2f8663

Please sign in to comment.