Skip to content

Configurable active user series #3153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Sep 16, 2020
Merged

Configurable active user series #3153

merged 24 commits into from
Sep 16, 2020

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Sep 10, 2020

What this PR does: This PR makes tracking of active series in the blocks storage configurable (there are two three parameters: how often to purge old series from memory and update metrics, and how old must series be to be considered inactive, and one to enable active series tracking [disabled by default]).

This makes it possible to track active series more precisely even if they are still in the TSDB head, but haven't been appended any samples recently.

In the current state of the PR, this is only wired into the blocks engine, and not chunks engine.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it! I have few concerns:

  1. The new metric adds more cognitive load to users when trying to understand the differences between chunks and blocks storage. Don't have an answer, but if it could work for the chunks storage too would probably easier to understand.
  2. I'm a bit scared of the lock contention in ActiveSeries (because it's an exclusive lock). I'm wondering if we switch the entry timestamp to an atomic, if a RWMutex could be used (and most of the time we would acquire the read because we expect the series to be already in the map).

@@ -103,6 +105,8 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.DurationVar(&cfg.MetadataRetainPeriod, "ingester.metadata-retain-period", 10*time.Minute, "Period at which metadata we have not seen will remain in memory before being deleted.")

f.DurationVar(&cfg.RateUpdatePeriod, "ingester.rate-update-period", 15*time.Second, "Period with which to update the per-user ingestion rates.")
f.DurationVar(&cfg.ActiveSeriesUpdatePeriod, "ingester.active-series-update-period", 5*time.Minute, "How often to update active series metrics (blocks engine only).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would default to 1m, to have more accurate tracking (by default).

@pstibrany
Copy link
Contributor Author

Thanks for working on it! I have few concerns:

  1. The new metric adds more cognitive load to users when trying to understand the differences between chunks and blocks storage. Don't have an answer, but if it could work for the chunks storage too would probably easier to understand.

I agree. Let's do that. (Reason why I kept chunks out of this PR is basically to first see how well it works in practice.)

  1. I'm a bit scared of the lock contention in ActiveSeries (because it's an exclusive lock). I'm wondering if we switch the entry timestamp to an atomic, if a RWMutex could be used (and most of the time we would acquire the read because we expect the series to be already in the map).

I will write a benchmark to see throughput of both approaches.

@pstibrany
Copy link
Contributor Author

Benchmark showing improvements when using RW lock and atomics to update timestamps.

name                                  old time/op    new time/op    delta
ActiveSeriesTest_single_series/50-4      154ns ± 2%      76ns ± 1%  -50.52%  (p=0.008 n=5+5)
ActiveSeriesTest_single_series/100-4     156ns ± 1%      76ns ± 1%  -51.25%  (p=0.008 n=5+5)
ActiveSeriesTest_single_series/500-4     158ns ± 1%      76ns ± 2%  -52.01%  (p=0.008 n=5+5)

name                                  old alloc/op   new alloc/op   delta
ActiveSeriesTest_single_series/50-4      0.00B          0.00B          ~     (all equal)
ActiveSeriesTest_single_series/100-4     0.00B          0.00B          ~     (all equal)
ActiveSeriesTest_single_series/500-4     0.00B          0.00B          ~     (all equal)

name                                  old allocs/op  new allocs/op  delta
ActiveSeriesTest_single_series/50-4       0.00           0.00          ~     (all equal)
ActiveSeriesTest_single_series/100-4      0.00           0.00          ~     (all equal)
ActiveSeriesTest_single_series/500-4      0.00           0.00          ~     (all equal)

func (s *activeSeriesStripe) updateSeriesTimestamp(now time.Time, series labels.Labels, fp model.Fingerprint, labelsCopy func(labels.Labels) labels.Labels) {
nowNanos := now.UnixNano()

e := s.findEntryForSeries(fp, series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a race condition: we get a reference of an activeSeriesEntry and then we do update it. In the meanwhile, the purge could remove it, because the read lock is kept only during findEntryForSeries(). However, the next remote write push will add it back, right? If so, LGTM, but I would like your opinion too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. Should be fixed now. (There was another problem... since atomic integer was not a pointer, but part of "entry", which was stored in a slice as a plain struct, without pointer, it could actually move in memory when slice grows. That's why atomic integers are now pointers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was another problem... since atomic integer was not a pointer, but part of "entry", which was stored in a slice as a plain struct, without pointer, it could actually move in memory when slice grows. That's why atomic integers are now pointers

Well spotted!

@gouthamve
Copy link
Contributor

The new metric adds more cognitive load to users when trying to understand the differences between chunks and blocks storage. Don't have an answer, but if it could work for the chunks storage too would probably easier to understand.

I'd be okay adding this to chunks store too. How much overhead will this add?

@pstibrany
Copy link
Contributor Author

pstibrany commented Sep 15, 2020

The new metric adds more cognitive load to users when trying to understand the differences between chunks and blocks storage. Don't have an answer, but if it could work for the chunks storage too would probably easier to understand.

I'd be okay adding this to chunks store too. How much overhead will this add?

By my counting, each activeSeriesEntry needs 40 bytes on its own + copy of the labels. For 1M series that's at least 40 MB, and if we estimate labels to take 200 bytes per series, that would be additional 200 MB.

(In blocks storage, we try to reuse labels copy if added to the ref cache at the same time. In chunks doing the same is not easy.)

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job. I left a couple of nits. As discussed offline, I would add a enabled flag (default to false) to not let every Cortex user paying the cost of this tracking if not required (this is usually required by vendors billing systems).

func (s *activeSeriesStripe) updateSeriesTimestamp(now time.Time, series labels.Labels, fp model.Fingerprint, labelsCopy func(labels.Labels) labels.Labels) {
nowNanos := now.UnixNano()

e := s.findEntryForSeries(fp, series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was another problem... since atomic integer was not a pointer, but part of "entry", which was stored in a slice as a plain struct, without pointer, it could actually move in memory when slice grows. That's why atomic integers are now pointers

Well spotted!

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Active series are exported as metric.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Also use atomic to keep oldest timestamp.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
…s is updated when slice in the stripe map is appended.

Avoid race when newly added entry is removed by purger due to timestamp being set to 0.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
…s to be in a loop).

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
…nction.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany
Copy link
Contributor Author

Excellent job. I left a couple of nits. As discussed offline, I would add a enabled flag (default to false) to not let every Cortex user paying the cost of this tracking if not required (this is usually required by vendors billing systems).

Added -ingester.active-series-enabled flag (defaults to false), and switched fingerprinting to use xxHash 64-bit, which is faster than fnv64a (more visible for bigger labels, so benchmark now uses bigger label), but requires extra small allocation:

name                         old time/op    new time/op    delta
ActiveSeries_UpdateSeries-4    1.03µs ± 6%    0.74µs ±17%  -28.25%  (p=0.008 n=5+5)

name                         old alloc/op   new alloc/op   delta
ActiveSeries_UpdateSeries-4      206B ± 0%      253B ±11%  +23.01%  (p=0.008 n=5+5)

name                         old allocs/op  new allocs/op  delta
ActiveSeries_UpdateSeries-4      2.00 ± 0%      3.00 ± 0%  +50.00%  (p=0.008 n=5+5)

@pstibrany
Copy link
Contributor Author

I've also made sure that if -ingester.active-series-enabled is false, then new metric doesn't appear in the metrics at all. Previous version of the chunks-code actually exported 0 value for each user, which is useless. Unfortunately this made PR bigger :( (alternative would be to not register the metric into to metrics registry, but still create all in-memory structures).

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, still LGTM! Left final nits.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany pstibrany merged commit e265199 into cortexproject:master Sep 16, 2020
@bboreham
Copy link
Contributor

In the current state of the PR, this is only wired into the blocks engine, and not chunks engine.

Is this still true?

@pstibrany
Copy link
Contributor Author

In the current state of the PR, this is only wired into the blocks engine, and not chunks engine.

Is this still true?

Not anymore. Also there is now a flag to enable this, which is disabled by default.

pstibrany pushed a commit that referenced this pull request Jun 8, 2021
* Enable active series metrics in the ingester by default

This change calculates and exports the `cortex_ingester_active_series`
by default. Up to this point, the metric was disabled by default since
calculating it consumes some amount of memory.

The original PR (#3153) estimated for 1M active series at least 40MB
and up to another 200MB depending on our luck reusing labels from the
ref cache.

We (Grafana) have been running with this setting enabled on our ingesters
for some time and the resource usage doesn't appear to be significant.
This feature appears to add between 1.2 - 1.6% in memory usage when
enabled: ~140MB out of a total of ~10GB of memory used per ingester.

The ingesters I measured this on

* Have multiple tenants running production workloads
* Have about 1.3M active series each
* Have about a 10GB working set (as measured by `kubectl top` and exported
  k8s metrics)

Based on this and the utility of the metric itself, I'd like to enable it
by default.

Screenshots of the pprof heap output attached.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants