Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable Max TTL duration for statsd input plugin entries #8509

Merged
merged 2 commits into from
Dec 4, 2020

Conversation

ivorybilled
Copy link
Contributor

The new config value will establish how long any metric can sit in the gauges/counters/timings/sets without being updated / received. When the Max TTL elapses, the metric will stop being reported until it was received again.

closes #8348

Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I got confused about the number of Gather calls. :D it's all good.

@ivorybilled ivorybilled merged commit 2187bac into master Dec 4, 2020
@ivorybilled ivorybilled deleted the addMaxTTLStatsdInput branch December 4, 2020 19:39
@LINKIWI
Copy link
Contributor

LINKIWI commented Dec 23, 2020

After upgrading from Telegraf 1.16.3 to 1.17.0, I've noticed that tags from statsd emissions are being mangled with unrelated metrics. I'm not pointing to this PR as the root cause for the regression, but this is also the only thing that's changed with the statsd plugin since the last release. If I have some more time I'll compile a more thorough bug report with a reproducible example.

@ivorybilled
Copy link
Contributor Author

Interesting. Thank you for reporting that, let me know when/if you get more info!

@gnjack
Copy link
Contributor

gnjack commented Jan 14, 2021

We've also seen the same mangled tags from statsd metrics after upgrading to 1.17.0

@gnjack
Copy link
Contributor

gnjack commented Jan 14, 2021

My golang is very rusty, but I think this is the cause of the issue:

The s.Lock() was moved from the start of parseStatsdLine to the start of aggregate. This means the lock has been moved from before tag calculation for a metric to after the tag calculation.

As the parser was recently multithreaded, this means there may be multiple goroutines calling the following line in parseStatsdLine which is now no longer inside a lock:

// Parse the name & tags from bucket
m.name, m.field, m.tags = s.parseName(m.bucket)

func (s *Statsd) parseName(bucket string) does not appear to be thread safe. First it pulls out tags for the current metric it was invoked with

// Parse out any tags in the bucket
if len(bucketparts) > 1 {
for _, btag := range bucketparts[1:] {
k, v := parseKeyValue(btag)
if k != "" {
tags[k] = v

then creates a GraphiteParser on the shared Statsd struct s

if p == nil || s.graphiteParser.Separator != s.MetricSeparator {
p, err = graphite.NewGraphiteParser(s.MetricSeparator, s.Templates, nil)
s.graphiteParser = p
}

and sets the default tags on that shared GraphiteParser to the current metrics tags

p.DefaultTags = tags

this means multiple goroutines will be concurrently modifying s.graphiteParser.DefaultTags to different values and calling name, tags, field, _ = s.graphiteParser.ApplyTemplate(name) which will return incorrect tags if another goroutine has modified s.graphiteParser.DefaultTags

arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023
…fluxdata#8509)

* Adding max TTL duration for all metric caches in the statsd input plugin

* Update README.md

was missing type in readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Max ttl for gauges in statsd input plugin
4 participants