-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegraf fails after random time period when using specific log parser #3142
Comments
Can you scrub the sensitive data and add the full error message? |
Here you go, here are a few lines:
|
I included a change in #3155 that should mitigate this problem. If InfluxDB rejects a point as being unparsable it will still be logged, but the write batch will treat it as a successful write. My reasoning behind this is that the point normally would never become parsable, and will always fail, so it is better to drop the point than get stuck. This change will be in 1.4.0. |
I guess I'm confused about why it even thinks it's not parsable... It logs many rows of similar type just fine, and then suddenly it starts failing. Let me provide some more details... So I decided to comment out the sidekiq log parser since I was having problems with it until you guys were able to get a fix out... but this morning, I realized that now it's failing with the same error for a different log parser... I wasn't seeing these fail in the logs before. Almost as if it's always the "last" log parser that it fails on... This log parser's conf looks like this:
Here's a sample record (and also the very record that corresponds with the error that comes later in this post:
I don't understand the Thoughts? Is this now a separate issue for InfluxDB instead? |
Telegraf is parsing the lines fine but they are being rejected by InfluxDB due to an escaping issue. Check the InfluxDB issue that is linked above for more details. |
@birengoodco I think I've fixed the issue, can you test it out with 1.4.0-rc2? |
Sweet!!! Thanks @danielnelson! I will give it a whirl tomorrow morning!! Appreciate the fast turnaround! |
Bug report
Per daniel's request from here, I'm opening this bug.
Telegraf keeps failing after some random amount of time and has to be restarted.
So what seems to happen is the log fills up with a bunch of “unable to parse” errors due to “bad timestamp”… but it makes no sense, since the data looks good in the log message and on top of that, I’m seeing the data actual end up in Influx. So not sure why I’m getting those… and then after a few minutes, it just dies altogether and nothing makes it into Influx anymore. I get the error
E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster
I can restart Telegraf and it works fine for a while until it dies again. In my pool of servers it doesn’t seem very consistent (all of the servers are running the same Telegraf config file… the only difference being the hostname that’s set). Any ideas?
I wasn’t having any problems until I introduced this log parser…
I’m also seeing this error on some other servers:
E! InfluxDB Output Error: Response Error: Status Code [400], expected [204], [partial write: unable to parse 'sidekiq_log
, I’m leaving out the rest of the error as it contains details I'd have to scrub.Relevant telegraf.conf:
System info:
Telegraf Version: 1.3.5
OS: Ubuntu 14.04.5
Steps to reproduce:
Expected behavior:
Actual behavior:
Additional info:
[Include gist of relevant config, logs, etc.]
Feature Request
Opening a feature request kicks off a discussion.
Proposal:
Current behavior:
Desired behavior:
Use case: [Why is this important (helps with prioritizing requests)]
The text was updated successfully, but these errors were encountered: