-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
telegraf panic in Exec plugin #1199
Comments
I've since been lead to believe that there is some general memory corruption issue with this box, so I think it is likely not telegraf's fault, so I'm closing this. (Although, changing telegraf's systemd handler to auto-restart the process might be a nice general extra layer of protection for problems like this.) |
Thanks @djahandarie, but I'm going to leave this open until I have time to investigate that line of code |
I see that this is actually panicking in the Go source code, so closing because it's definitely caused by your memory corruption issue. fwiw, Telegraf is supposed to be restarted on failure (that's what this line is for: https://github.com/influxdata/telegraf/blob/master/scripts/telegraf.service#L11) |
@sparrc So, I hit this again on a different box:
This new box also happens to be an AWS EC2 box (in a completely different region). No non-EC2 boxes have had this issue so far. It could just be that this box also has some memory issue. Or there's a Xen bug. Or there's a go bug. Or there's a telegraf bug. What's interesting is that it's the same exactly line of I still don't understand how |
I don't quite understand how this would happen either. I'm assuming you are running version 0.13? re-opening for further investigation... |
Yep, 0.13. Here's another version of it:
This time the first parameter of grow is an actual pointer, but it still leads to a panic. I'm going to build a 0.13 binary with -race and try running that instead to see if a data race is involved. |
I have possibly related issue #1424. |
@djahandarie Many of the issues we have had in this bit of code have been fixed by recent Go versions. Can you let me know if you still see these issues with the latest release? I'm pretty confident it is fixed so I'm going to close this issue but we can reopen if it is still occurring. |
On telegraf 0.13.0.
I installed telegraf on this box earlier today, and about 30 minutes later, I got this in
telegraf.log
:The same telegraf configuration+version pair has no issues on a dozen or so other boxes that it runs on.
This seems like a really odd panic. The documentation for
Buffer.grow
says it can panic with anErrTooLarge
, but this seems to be different. The line in particular ism := b.Len()
, and for that to panic with a nil pointer dereference, I guess that means the pointer to the buffer got clobbered (and I think the first parameter being0x0
confirms that?).This backtrace does not seem to be providing the full story on how line
exec.go:157
leads to a buffer being grown, so I'm mostly at a loss as to what's going on. I'm happy to provide more debug info if needed.The text was updated successfully, but these errors were encountered: