Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input to read and parse a file every interval #3883

Closed
mjf opened this issue Mar 13, 2018 · 7 comments · Fixed by #4332
Closed

Input to read and parse a file every interval #3883

mjf opened this issue Mar 13, 2018 · 7 comments · Fixed by #4332
Assignees
Labels
feature request Requests for new plugin and for new features to existing plugins new plugin
Milestone

Comments

@mjf
Copy link

mjf commented Mar 13, 2018

Hello.

I would like to propose new input plugin to read and parse (Grok) a file every interval, preferably called [[inputs.parser]] (or [[inputs.reader]] in case #3479 is planned to be done one day).


@danielnelson danielnelson added the feature request Requests for new plugin and for new features to existing plugins label Mar 14, 2018
@dirkdevriendt
Copy link

We're also stuck on a seemingly simple use case where AFAICT we bump into a (sometimes surprising) shortcoming in the existing tail & logparser input inputs.

We have logfiles that have the date in their name and data in influx formatted lines.
[[inputs.tail]] with files = ["/data/*.log"] does not work because it does not pick up newly created files
[[inputs.logparser]] does not work because it does not support data_format = "influx" (and we do not know up front which metrics will be in there)

A solution would be if:

  • tail had a setting that would re-evaluate the list of files at every interval
  • logparser could read influx formatted files (assuming it is able to load new files in flight)

In addition to that, a setting to not only recreate the file list but also parse every file from the beginning at very interval would also fulfill the OP's request.

Note that this may also prevent some of the confusion around the issues #1829 #2141 #2847 #3492, where I suspect that the cause could for instance be that inotify keeps references to files that get truncated, but "loses" the ones to files that get deleted and recreated.
An option to refresh the file list would make debugging easier.

@danielnelson
Copy link
Contributor

@dirkdevriendt Keep an eye on #3479, once we do this the next step will be to merge logparser and tail.

@mjf
Copy link
Author

mjf commented Apr 30, 2018

The [[inputs.reader]] should support at least these options:

  • parser type (i.e. "none", "grok", "influx", etc.);
  • change watch method:
    • "inotify", "mtime" or "poll" for files;
    • "list", "mtime", "dnotify" for directories;
  • action on event:
    • "parse" to run the parser;
    • "exec" (for "preprocessing" etc.);
  • continuous reading (a.k.a. "tail");
  • reuse last known position in a file;
  • handle list of objects changes (i.e. updates on reload, every n-th interval etc.)

and mabe few other options...

I would also suggest to let users form sets (groups) sharing similar combination of these options for list of path resources (directory file from glob patterns) etc. It should also support the Golang glob patterns etc.

I would also suggest to support separate configurations for the parsers (i.e. to let users maintain different rulesets for parsers for different groups (sets) of paths.

I would also suggest to support not only regular files but also other types of objects (i.e. unix sockets).


Maybe, this or similar way, the actual [[inputs.tail]] and [[inputs.logparser]] would become obsoleted and replaceable with single and more general input plugin.

@danielnelson
Copy link
Contributor

That's a lot of ideas, but I'd like to keep this plugin targeted at a common use case instead of making it into a jack of all trades plugin.

I don't want to support directories directly, but the glob pattern can be used to match files instead. We can evaluate the glob pattern each interval or on a custom interval. All files matched should be sampled together each interval, I don't think we should try to introduce inotify.

Continuous reading and last known position belong in tail. I'm also going to say no to running exec to preprocess, if you need this then use the exec plugin.

Other changes to the configuration capabilities to reduce config verbosity and normalize the config need to be introduced as separate issues.

You can use socket_listener for unix sockets.

@mjf
Copy link
Author

mjf commented May 2, 2018

@danielnelson

That's a lot of ideas, but I'd like to keep this plugin targeted at a common use case instead of making it into a jack of all trades plugin.

I have no real problem with it. For my use case some method to re-read and parse (using Grok) a file every interval is crutial atm. (see below).

OT: If you like to keep the plugins and have, say, read, tail, exec, whatever other plugin for reading filesystem objects separated, then all of them SHOULD support parsers! Or, some mechanism to chain plugins into pipelines should exist and parser instance should be standalone part of such a pipeline. That would be the nicest way, I think.

I don't want to support directories directly, but the glob pattern can be used to match files instead.

I think that may be good-enough. The idea of supporting directories was to let user configure a directory to be watched itself and perform an action in case it changes (i.e. re-read list of files, update something, etc). But I can easily live without it personally. I just tried to stress most of the options that somebody would expect to have (IMHO) for such an input plugin. I do not need all of them personaly atm.

We can evaluate the glob pattern each interval or on a custom interval.

That would be good-enough. (I just completely forgot about the custom intervals, my fault.)

All files matched should be sampled together each interval, I don't think we should try to introduce inotify.

The inotify is quite a good mechanism and can be configured nicely too.

💡 I would like to suggest Telegraf had some sort of generic support for filesystem object change detection (for both files and directories and with misc. methods like inotify, dnotify, mtime, hash etc.) that could be used in plugins that needed it (i.e logparser, tail). But that's just my opinion...

In case you would not decide to go the way of merging logparser and tail plugins into single and more general reader (or whatever name you liked), which I though would be ideal and seems to me as a quite logical step, please add the support for the Grok parser to the exec plugin so that people could at least "somehow" get their statistics from /proc and perhaps other resources in a sane way (where the "sane way" means "not utilizing some ugly and resources consuming shell scripts").

Other changes to the configuration capabilities to reduce config verbosity and normalize the config need to be introduced as separate issues.

OK. But I am not sure whether I understand properly what this means.

You can use socket_listener for unix sockets.

Well, then the socket_listener is yet another plugin that needs support for parsers, right?

@danielnelson
Copy link
Contributor

In case you would not decide to go the way of merging logparser and tail plugins into single and more general reader (or whatever name you liked)

My plan is to merge logparser into tail, we will just have the tail plugin and this separate plugin. It turns out following a file is tricky, and the library we have been using has some issues, so I want to keep that type of logic out of this plugin.

please add the support for the Grok parser to the exec plugin so that people could at least "somehow" get their statistics from /proc

This is #3479, you will be able to use Grok in any input that has a data_format option.

Other changes to the configuration capabilities to reduce config verbosity and normalize the config need to be introduced as separate issues.

OK. But I am not sure whether I understand properly what this means.

It sounded like you have some ideas around changes to the general config file syntax, we can open new issues to discuss these but I don't want to tie the idea to this plugin, because it would delay the plugin unnecessarily.

Well, then the socket_listener is yet another plugin that needs support for parsers, right?

It's in there now

@mjf
Copy link
Author

mjf commented May 3, 2018

@danielnelson Thank you very much for clarification things more further.

This is #3479, you will be able to use Grok in any input that has a data_format option.

That would be splendid! I am looking forward to it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins new plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants