Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add reset-mode flag for CSV parser #11288

Merged
merged 5 commits into from
Jun 30, 2022

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented Jun 13, 2022

resolves #11257, resolves #10678

This PR adds a flag to allow the CSV parser to be reset on every call to Parse. This is required as, dependent on the input processing, the CSV parser needs to either keep track of the state (e.g. when parsing linewise) or needs to drop the state (e.g. when reading whole files). In the first case, you should set csv_reset_mode = "none" (default), while in the second case you should set csv_reset_mode = "always".

Please note: Reset-mode "always" is ignored for ParseLine (which is not used in our codebase) as this implicitly requires keeping track of the state if used with skipped-rows, headers and/or metadata.

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Jun 13, 2022
@JackSharebourg
Copy link
Contributor

@srebhan tanks for looking into this.
With csv_reset_mode = "always" the header of a new csv does not throw an error anymore, but it is also not read correctly. All csv's are parsed with the header information of the first read csv. See the example below. First cpu.csv is send, then as second file weather.csv. But as you can see in output weather.csv has the field keys given by cpu.csv.

test.conf

[[inputs.mqtt_consumer]]
  servers = ["tcp://127.0.0.1:1883"]

  topics = [
    "foo/#",
  ]
  topic_tag = ""

  data_format = "csv"
  csv_header_row_count = 1
  csv_timestamp_column = "time"
  csv_timestamp_format = "unix_ms"
  csv_skip_values = [""]
  csv_reset_mode = "always"

[[outputs.file]]
  files = ["stdout"]
  data_format = "influx"

[agent]
  omit_hostname = true

cpu.csv

time,measurement,cpu,time_user,time_system,time_idle
1645884455653,cpu,cpu0,42,42,42

weather.csv

time,outlook,temperature,humidity,windy,play
1645884455657,overcast,hot,high,FALSE,yes

output

mqtt_consumer cpu="cpu0",time_user=42i,time_system=42i,time_idle=42i,measurement="cpu" 1645884455653000000
mqtt_consumer measurement="overcast",cpu="hot",time_user="high",time_system=false,time_idle="yes" 1645884455657000000

@srebhan
Copy link
Member Author

srebhan commented Jun 14, 2022

Can you please try again?

@JackSharebourg
Copy link
Contributor

Works for me

@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jun 29, 2022
@powersj powersj merged commit 7d83b07 into influxdata:master Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

csv parsing issue with mqtt_consumer CSV parser does not handle csv_header_row_count correctly
4 participants