feat: Blooms retention #12258

salvacorts · 2024-03-19T12:32:58Z

What this PR does / why we need it:

This PR adds retention to bloom blocks and metas. The retention is applied by only one compactor (the one that owns the smallest token).

Compaction works as follows:

Check if the compactor owns the smaller token in the ring. If so, this compactor should run retention.
Start with day = today - 1 day
Get bloom client for day
List tenants in the day table
If there is no tenants, we are done (means we have applied retention for all tenants here in the last iteration)
For each tenant:
6.1. Get max retention across all streams
6.2. If day < (now - maxretention) --> delete all content for tenant in the day table.
day--; goto 3.

We try to run retention up to once a day but:

On a ring topology change, two compactors might run retention simultaneusly (if both see they own the smallest token).
We store the last day retention was applied in memory, if the compactor restarts, this is lost and retention might be run again.

We add the following configs:

bloom_compactor:
  retention:
    enabled: false
    max_lookback_days: 365

Special notes for your reviewer:
They PR looks big but it's mostly tests.

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

owen-d

An unintended consequence here is that if a tenant has any retention configs larger than the range the compactor checks for, it'll never delete blooms for that tenant. I think it makes more sense to delete all tenants once they hit max_lookback_days regardless of their configs.

owen-d · 2024-03-21T16:38:25Z

pkg/bloomcompactor/metrics.go

+			// 1second -> 5 years, 10 buckets
+			Buckets: prometheus.DefBuckets,


The default buckets are 0.005->10s

That comment is misleading. I forgot to remove that comment after copy-pasting. This metric tracks the time needed to apply retention. The def buckets should work.

owen-d · 2024-03-21T17:03:50Z

pkg/storage/stores/shipper/bloomshipper/resolver.go

+	Tenant(tenant, table string) Location
+	ParseTenantKey(loc Location) (string, error)


Maybe rename this to TenantPrefix and ParseTenantFromKey?

owen-d · 2024-03-21T17:04:32Z

pkg/storage/stores/shipper/bloomshipper/store.go

@@ -127,6 +128,51 @@ func (b *bloomStoreEntry) FetchBlocks(ctx context.Context, refs []BlockRef, _ ..
 	return b.fetcher.FetchBlocks(ctx, refs)
 }

+func (b *bloomStoreEntry) TenantFilesForInterval(ctx context.Context, interval Interval) (map[string][]string, error) {


nit: not needed now, but making this not need to buffer the entire file list would be good.

I added a TODO inside the function to add pooling if this becomes a problem.

I refactored the implementation to reuse the slices from the listed objects.

owen-d · 2024-03-21T17:10:55Z

pkg/bloomcompactor/retention.go

+
+	startDay := storageconfig.NewDayTime(today.Add(-smallestRetention))
+	endDay := storageconfig.NewDayTime(0)
+	if r.cfg.MaxLookbackDays > 0 {


I don't think this should be allowed to be zero to prevent trying to iterate every day since 1970.

Agree. Done.

owen-d · 2024-03-21T17:12:55Z

pkg/bloomcompactor/retention.go

+			break
+		}
+
+		for tenant, objectKeys := range tenants {


nit: it's probably better to pass limits to TenantFilesForInterval so it doesn't need to allocate space for files that won't be removed

I ended up doing something slightly different. The TenantFilesForInterval now takes a filter function. We use that filter function to filter out tenants whose retention hasn't expired yet.

salvacorts · 2024-03-22T11:25:01Z

pkg/storage/stores/shipper/bloomshipper/store.go

 			}

 			if _, ok := tenants[tenant]; !ok {
-				tenants[tenant] = make([]string, 0, 100)
+				tenants[tenant] = nil // Initialize tenant with empty slice


We do this since the function returns all tenants regardless of the filter. The filter filters out the files.
This way we can know which tenants are there for a given day, but filter the files whose retention are not expired yet.

salvacorts · 2024-03-22T11:33:32Z

An unintended consequence here is that if a tenant has any retention configs larger than the range the compactor checks for, it'll never delete blooms for that tenant.

The goal of max_lookback_days is to prevent us from iterating till 1970. Still, retention stops earlier than the lookback days if for a given day table there are no tenants: meaning that a previous retention iteration removed all the tenants and therefore we can assume there won't be files for any tenants beyond that day.

I think it makes more sense to delete all tenants once they hit max_lookback_days regardless of their configs.

I see two problems with that:

We'd be storing blooms for tenants whose retention has expired before max_lookback_days.
If a tenant retention is set beyond max_lookback_days, we'd be deleting blooms earlier than desired.

I'd stick with using max_lookback_days as a global limit to make sure we don't iterate too long. To be clear, we shouldn't actually reach max_lookback_days at all under normal circumstances.

owen-d

Let's add a metric which fires when a tenant's retention is longer than the lookback the compactor checks so we can alert on it.

Left a nit, but giving you an approval so you can merge when fixed

owen-d · 2024-03-27T18:01:57Z

pkg/bloomcompactor/retention.go

+}
+
+// findSmallestRetention returns the smallest retention period across all tenants.
+// It also returns a boolean indicating if there is any retention period set at all


This is somewhat confusing. It returns the smallest retention, but skips zero (disabled). It also does not return a bool like it suggests. Maybe smallestEnabledRetention?

Changed, thanks.

pull-request-size bot added the size/XL label Mar 19, 2024

salvacorts force-pushed the salvacorts/bloom-retention branch from 78f085f to e33f8eb Compare March 19, 2024 15:05

salvacorts changed the title ~~Salvacorts/bloom retention~~ feat: Blooms retention Mar 19, 2024

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Mar 19, 2024

Blooms retention

f30cafb

salvacorts force-pushed the salvacorts/bloom-retention branch from 87a973d to f30cafb Compare March 20, 2024 10:45

salvacorts added 5 commits March 20, 2024 12:25

imports

25a2c8f

fix ownsretention

e6351f2

Add metrics and logging

33c80ce

Fix run once a day

71ee5dc

Skip days until smallest retention

cbef0ed

pull-request-size bot added size/XXL and removed size/XL labels Mar 21, 2024

take defaults into account

38b9f11

salvacorts marked this pull request as ready for review March 21, 2024 15:58

salvacorts requested a review from a team as a code owner March 21, 2024 15:58

owen-d reviewed Mar 21, 2024

View reviewed changes

salvacorts added 2 commits March 22, 2024 12:11

CR feedback

a1af211

Merge branch 'main' into salvacorts/bloom-retention

61893df

salvacorts commented Mar 22, 2024

View reviewed changes

fix test merge

ac7944d

salvacorts added 2 commits March 22, 2024 13:22

fix test

4f3aad9

fix race confition in tests when using inmemory storage

f872e8d

salvacorts requested a review from owen-d March 25, 2024 08:19

Merge branch 'main' into salvacorts/bloom-retention

6daa3b3

owen-d approved these changes Mar 27, 2024

View reviewed changes

salvacorts added 3 commits March 28, 2024 09:34

Rename to smallestEnabledRetention

ae48be3

Report tenants exceeding lookback

5576e24

Merge branch 'main' into salvacorts/bloom-retention

40cabd8

fix after merge

5eb2643

salvacorts merged commit 86c768c into main Mar 28, 2024
11 checks passed

salvacorts deleted the salvacorts/bloom-retention branch March 28, 2024 10:44

rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024

feat: Blooms retention (grafana#12258)

d43f16e

loki-gh-app bot mentioned this pull request Oct 18, 2024

chore(release-3.1.x): release 3.1.3 #14531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Blooms retention #12258

feat: Blooms retention #12258

salvacorts commented Mar 19, 2024 •

edited

Loading

owen-d left a comment

owen-d Mar 21, 2024

salvacorts Mar 22, 2024 •

edited

Loading

owen-d Mar 21, 2024

salvacorts Mar 22, 2024

owen-d Mar 21, 2024

salvacorts Mar 22, 2024

salvacorts Mar 22, 2024

owen-d Mar 21, 2024

salvacorts Mar 22, 2024

owen-d Mar 21, 2024

salvacorts Mar 22, 2024

salvacorts Mar 22, 2024

salvacorts commented Mar 22, 2024

owen-d left a comment

owen-d Mar 27, 2024

salvacorts Mar 28, 2024

		// 1second -> 5 years, 10 buckets
		Buckets: prometheus.DefBuckets,

		Tenant(tenant, table string) Location
		ParseTenantKey(loc Location) (string, error)

feat: Blooms retention #12258

feat: Blooms retention #12258

Conversation

salvacorts commented Mar 19, 2024 • edited Loading

owen-d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvacorts Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvacorts commented Mar 22, 2024

owen-d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvacorts commented Mar 19, 2024 •

edited

Loading

salvacorts Mar 22, 2024 •

edited

Loading