Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(processors.parser): Add option to parse tags #11228

Merged

Conversation

Hipska
Copy link
Contributor

@Hipska Hipska commented May 31, 2022

Required for all PRs:

Add an option to also parse tags (as the example config description was already saying that).

I don't have added tests yet. If someone could tell me the test(name)s I should add, I'm happy to implement them..

This is a revive of #11003 since that one was broken..

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label May 31, 2022
@Hipska Hipska requested review from reimda, sspaink and srebhan June 1, 2022 15:28
@srebhan
Copy link
Member

srebhan commented Jun 10, 2022

@reimda any comment here? Code looks good so far.

@reimda
Copy link
Contributor

reimda commented Jun 10, 2022

Hi Thomas, could you tell me more about your use case? It still seems to me like the converter plugin already provides this functionality in a simple way. I think it's only a line or two more configuration than what you propose in this PR, and it requires no additional code.

Sorry I haven't answered your questions from the previous PR (#11003 (comment)) until now:

but the sample config already mentioned tags

Handling tags actually was part of the original PR for this plugin (#4551). It was removed intentionally because the converter plugin made it redundant. Here are the commits where it was removed:

  • remove tags tests - 38733c1
  • remove tags functionality - c736252

The sample config mentions "field/tag(s)" only because it wasn't removed along with related stuff in those two commits. Ever since the plugin was merged, the mention of tags in the docs has been a typo. It's not a good reason to add it back.

I think it was right to remove it and encourage people to use converter. Having processors act on either tags or fields but not both reduces duplicated code and tests. If every plugin has to handle both tags and fields, very similar code and tests are duplicated in every plugin. Everywhere you act on one you have to act on the other, doubling that kind of code in every plugin. More code means more potential for bugs and slower unit tests.

If we have plugins handle only tags or only fields, and then we rely on the converter to switch from one to the other in unusual use cases, we don't have that doubling of tag/field handling code in the plugins.

Other processors also work on both tags and fields

Some do, but most don't. Most of them act on either tag or field depending on the main use case. If someone is using a plugin in an unintended way (not the main use case) they are expected to use converter.

Bold ones in the table below act on either field or tag but not both, italic ones act on both.

name input source output destination
aws tag tag
clone (none) (none)
converter tag, field tag, field, metric name
date (none) tag, field
dedup (none) (none)
defaults field field
enum tag or field same as input
execd any any
filepath tag, field same as input
ifname tag tag
noise field field
override (none) tag
parser field field
pivot tag field
port_name tag or field same as input
printer (none) (none)
regex tag or field same as input
rename tag, field, metric name same as input
reverse_dns tag, field same as input
s2geo fields tag
starlark any any
strings tag, field, metric name same as input
tag_limit tag tag (removes)
template tag, field, metric name tag
topk series (including values) add tag
unpivot field tag

I'm not completely opposed to having a processor act on both fields and tags if there's a good reason for it. Most of the italic ones in the table make sense to me. For example, if you want to rename a tag, it would be crazy to have to convert it to a field, rename the field, then convert it back to a tag. It makes sense for the rename processor to act on both tags and fields.

Parsing a tag value doesn't make the same sense to me as renaming a tag. Usually data that's complex enough to be parsed is held in fields, so that's the main use case. In your case it must not be this way. If you could describe your use case and why you have data in a tag that needs to be parsed, maybe it would clear up for me why it's not acceptable to use the converter processor.

@Hipska
Copy link
Contributor Author

Hipska commented Jun 13, 2022

So this is my use case:

[[processors.strings]]
  order = 1
  namepass = ["SGSN-MME_*"]

  [[processors.strings.replace]]
    tag = "MeasObjLdn"
    old = ","
    new = " "


[[processors.converter]]
  order = 2
  namepass = ["SGSN-MME_*"]

  ## Tags to convert
  [processors.converter.tags]
    string = ["MeasObjLdn"]


# Parse a value in a specified field/tag(s) and add the result in a new metric
[[processors.parser]]
  order = 3
  namepass = ["SGSN-MME_*"]

  ## The name of the fields whose value will be parsed.
  parse_fields = ["MeasObjLdn"]

  ## If true, incoming metrics are not emitted.
  # drop_original = false

  ## If set to override, emitted metrics will be merged by overriding the
  ## original metric using the newly parsed metrics.
  merge = "override"

  ## The dataformat to be read from files
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  data_format = "logfmt"

  ## Array of key names which should be collected as tags. Globs accepted.
  logfmt_tag_keys = ["*"]


# Apply metric modifications using override semantics.
[[processors.override]]
  order = 4
  namepass = ["SGSN-MME_*"]

  ## Fields to drop
  fielddrop = ["MeasObjLdn"]

The MeasObjLdn contains values like:

  • ManagedElement=<hostname>,SgsnMme=1,RouterInstance=<foobar>
  • ManagedElement=<hostname>,SgsnMme=1,IpInterface=ETH_1_25_1_1920

So they are all identifying and thus a tag but not a field. Wit this PR, I could eliminate processor 2 and 4 (as my output does not support string fields).

@Hipska
Copy link
Contributor Author

Hipska commented Sep 7, 2022

@reimda WDYT?

@reimda
Copy link
Contributor

reimda commented Sep 8, 2022

Hi Thomas. Where does the MeasObjLdn tag come from? The use case you shared only includes the processors, not inputs. When I googled the term I got hits in a digital cellular XML schema (ETSI TS 132 435 V11) which sounds related to your work.

If it's coming from parsed xml, could you get rid of parser "order = 2" by having the xml parser save it as a field instead of a tag? Maybe that isn't an option due to other constraints?

I know this specific case is useful to you so I'm not going to refuse to merge the PR. The code and docs changes look good to me. It still does seem to me like merging this PR is a small step towards making all processors work on both tags and fields which is not the most maintainable and usable.

@Hipska
Copy link
Contributor Author

Hipska commented Sep 9, 2022

You are right, it indeed comes from parsing specific xml files. I could indeed set it as a field, but not knowing on beforehand the name of the other fields, there could be a duplicate naming where one or the other gets overwritten. (I know, bad practice, but Murphy's law is not helping)

So we're fine with this after adding tests?

@reimda
Copy link
Contributor

reimda commented Sep 9, 2022

Yes, with tests added I'll merge it.

@Hipska Hipska changed the title feat(processors/parser): Add option to parse tags feat(processors.parser): Add option to parse tags Sep 12, 2022
@Hipska
Copy link
Contributor Author

Hipska commented Sep 12, 2022

Done, hope I made it for the release :)

@reimda reimda added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Sep 12, 2022
@reimda
Copy link
Contributor

reimda commented Sep 12, 2022

1.24.0 was tagged about a half hour ago so this didn't make it. Sorry I didn't see it was ready.

@reimda reimda merged commit d976158 into influxdata:master Sep 12, 2022
@Hipska Hipska deleted the feature/processors/parser/tags branch September 13, 2022 07:55
@Hipska
Copy link
Contributor Author

Hipska commented Sep 13, 2022

Ah bummer, will then go into 1.25 or 1.24.1?

@reimda
Copy link
Contributor

reimda commented Sep 14, 2022

Ah bummer, will then go into 1.25 or 1.24.1?

Yes. 1.24.1 is scheduled for October 3

reimda pushed a commit that referenced this pull request Sep 19, 2022
@Hipska
Copy link
Contributor Author

Hipska commented Sep 20, 2022

Well it's there sooner than expected 😉

dba-leshop pushed a commit to dba-leshop/telegraf that referenced this pull request Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/processor ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants