Regression: CSV parser does not handle csv_skip_rows correctly #10347

DennisBayer · 2021-12-24T18:09:17Z

Relevent telegraf.conf

# (equal for 1.20.4 and 1.21.1)

[[inputs.exec]]
  name_override = "mtr"
  interval = "60s"
  commands = [
    "mtr -i3 -C exampleA.com",
    "mtr -i3 -C 8.8.8.8",
    "mtr -i3 -C exampleB.com"
  ]
  timeout = "55s"
  data_format = "csv"
  csv_skip_rows = 1
  csv_column_names = [ "", "", "status", "dest", "hop", "ip", "loss", "snt", "", "", "avg", "best", "worst", "stdev"]
  csv_tag_columns = ["dest", "hop", "ip"]

Logs from Telegraf

Note: I added new lines in the test run between each mtr output for easier reading.

1.20.4

2021-12-24T16:55:20Z I! Starting Telegraf 1.20.4
2021-12-24T16:55:20Z I! Using config file: /etc/telegraf/telegraf.conf
2021-12-24T16:55:20Z I! Loaded inputs: cpu disk diskio docker exec kernel mem net ping processes sensors swap system
2021-12-24T16:55:20Z I! Loaded aggregators: 
2021-12-24T16:55:20Z I! Loaded processors: regex
2021-12-24T16:55:20Z I! Loaded outputs: influxdb_v2
2021-12-24T16:55:20Z I! Tags enabled: host=mimas
2021-12-24T16:55:20Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"mimas", Flush Interval:10s
> no further logs

telegraf@mimas:/$ telegraf --config /etc/telegraf/telegraf.conf --input-filter exec --test --debug
2021-12-24T16:55:52Z I! Starting Telegraf 1.20.4
2021-12-24T16:55:52Z D! [agent] Initializing plugins
2021-12-24T16:55:52Z D! [agent] Starting service inputs
> mtr,dest=10.30.2.2,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.28,best=0.23,loss=60,snt=10i,status="OK",stdev=0.06,worst=0.36 1640364989000000000
[snip]
> mtr,dest=10.30.2.2,hop=13,hoppad=13,host=mimas,ip=10.30.2.2 avg=10.86,best=9.58,loss=0,snt=10i,status="OK",stdev=0.8,worst=11.94 1640364989000000000

> mtr,dest=8.8.8.8,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.23,best=0.18,loss=0,snt=10i,status="OK",stdev=0.05,worst=0.35 1640364990000000000
[snip]
> mtr,dest=8.8.8.8,hop=16,hoppad=16,host=mimas,ip=dns.google avg=20.88,best=14.3,loss=0,snt=10i,status="OK",stdev=4.45,worst=28.09 1640364990000000000
2021-12-24T16:56:29Z D! [agent] Stopping service inputs
2021-12-24T16:56:29Z D! [agent] Input channel closed
2021-12-24T16:56:29Z D! [agent] Processor channel closed
2021-12-24T16:56:29Z D! [agent] Stopped Successfully

> mtr,dest=example.com,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.22,best=0.18,loss=60,snt=10i,status="OK",stdev=0.04,worst=0.26 1640364990000000000
[snip]
> mtr,dest=example.com,hop=17,hoppad=17,host=mimas,ip=com.example avg=23.26,best=19.27,loss=0,snt=10i,status="OK",stdev=3.42,worst=29.38 1640364990000000000
telegraf@mimas:/$

telegraf@mimas:/$ mtr -C 8.8.8.8
Mtr_Version,Start_Time,Status,Host,Hop,Ip,Loss%,Snt, ,Last,Avg,Best,Wrst,StDev,
MTR.0.92,1640365204,OK,8.8.8.8,1,192.168.178.4,0.00,10,0,0.26,0.23,0.20,0.33,0.04
[snip]
MTR.0.92,1640365204,OK,8.8.8.8,16,dns.google,0.00,10,0,24.36,44.54,18.01,199.74,54.84

1.21.1

2021-12-24T17:12:33Z I! Starting Telegraf 1.21.1
2021-12-24T17:12:33Z I! Using config file: /etc/telegraf/telegraf.conf
2021-12-24T17:12:33Z I! Loaded inputs: cpu disk diskio docker exec kernel mem net ping processes sensors swap system
2021-12-24T17:12:33Z I! Loaded aggregators: 
2021-12-24T17:12:33Z I! Loaded processors: regex
2021-12-24T17:12:33Z I! Loaded outputs: influxdb_v2
2021-12-24T17:12:33Z I! Tags enabled: host=mimas
2021-12-24T17:12:33Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"mimas", Flush Interval:10s
2021-12-24T17:13:45Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "loss" on measurement "mtr" is type string, already exists as type float dropped=2
2021-12-24T17:14:43Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "best" on measurement "mtr" is type string, already exists as type float dropped=3
2021-12-24T17:15:44Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "loss" on measurement "mtr" is type string, already exists as type float dropped=3

2021-12-24T16:44:17Z I! Starting Telegraf 1.21.1
2021-12-24T16:44:17Z I! Loaded inputs: exec
2021-12-24T16:44:17Z I! Loaded aggregators: 
2021-12-24T16:44:17Z I! Loaded processors: regex
2021-12-24T16:44:17Z I! W! Outputs are not used in testing mode!
2021-12-24T16:44:17Z I! Tags enabled: host=mimas
2021-12-24T16:44:17Z D! [agent] Initializing plugins
2021-12-24T16:44:17Z D! [agent] Starting service inputs

telegraf@mimas:/$ telegraf --config /etc/telegraf/telegraf.conf --input-filter exec --test --debug
> mtr,dest=10.30.2.2,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.23,best=0.2,loss=0,snt=10i,status="OK",stdev=0.04,worst=0.3 1640364294000000000
[snip]
> mtr,dest=10.30.2.2,hop=13,hoppad=13,host=mimas,ip=10.30.2.2 avg=11.01,best=9.28,loss=0,snt=10i,status="OK",stdev=0.9,worst=12.04 1640364294000000000

> mtr,dest=Host,hop=Hop,host=mimas,ip=Ip avg="Avg",best="Best",loss="Loss%",snt="Snt",status="Status",stdev="StDev",worst="Wrst" 1640364294000000000
> mtr,dest=8.8.8.8,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.21,best=0.17,loss=10,snt=10i,status="OK",stdev=0.02,worst=0.24 1640364294000000000
[snip]
> mtr,dest=8.8.8.8,hop=16,hoppad=16,host=mimas,ip=dns.google avg=20.8,best=15.37,loss=0,snt=10i,status="OK",stdev=4.08,worst=29.39 1640364294000000000
2021-12-24T16:44:54Z D! [agent] Stopping service inputs
2021-12-24T16:44:54Z D! [agent] Input channel closed

> mtr,dest=Host,hop=Hop,host=mimas,ip=Ip avg="Avg",best="Best",loss="Loss%",snt="Snt",status="Status",stdev="StDev",worst="Wrst" 1640364294000000000
2021-12-24T16:44:54Z D! [agent] Processor channel closed
> mtr,dest=example.com,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.22,best=0.16,loss=20,snt=10i,status="OK",stdev=0.03,worst=0.25 1640364294000000000
2021-12-24T16:44:54Z D! [agent] Stopped Successfully
[snip]
> mtr,dest=example.com,hop=17,hoppad=17,host=mimas,ip=com.examplee avg=28.07,best=21.43,loss=0,snt=10i,status="OK",stdev=7.77,worst=48.27 1640364294000000000
telegraf@mimas:/$

telegraf@mimas:/$ mtr -C 8.8.8.8
Mtr_Version,Start_Time,Status,Host,Hop,Ip,Loss%,Snt, ,Last,Avg,Best,Wrst,StDev,
MTR.0.92,1640361652,OK,8.8.8.8,1,192.168.178.4,0.00,10,0,0.29,0.25,0.20,0.38,0.05
[snip]
MTR.0.92,1640361652,OK,8.8.8.8,16,dns.google,0.00,10,0,19.24,26.25,17.39,36.88,7.37

System info

Telegraf 1.21.1; Docker Compose version 2.2.2; Docker version 20.10.11, build dea9396e18; Manjaro

Docker

./telegraf/Dockerfile-mtr

ARG IMAGE_NAME
ARG IMAGE_VERSION
 
FROM ${IMAGE_NAME:-telegraf}:${IMAGE_VERSION:-1.21.1}
 
RUN apt-get update && apt-get install -y --no-install-recommends mtr-tiny && \
  rm -rf /var/lib/apt/lists/*

./docker-compose.yml

telegraf:
  image: telegraf:1.21.1.mtr
  build:
    context: ./telegraf
    dockerfile: Dockerfile-mtr
    args:
      - IMAGE_NAME=telegraf
      - IMAGE_VERSION=1.21.1
  user: telegraf:963
  # Take care of MTR zombie processes
  init: true
  restart: unless-stopped
  network_mode: "host"
  volumes:
    # Local telegraf config
    - ./telegraf/config/telegraf.conf:/etc/telegraf/telegraf.conf:ro
    # Monitor docker engine host
    - /:/hostfs:ro
    # Monitor docker containers
    - /var/run/docker.sock:/var/run/docker.sock
  environment:
    # Monitor docker engine host
    - HOST_MOUNT_PREFIX=/hostfs
    - HOST_ETC=/hostfs/etc
    - HOST_PROC=/hostfs/proc
    - HOST_SYS=/hostfs/sys
    - HOST_VAR=/hostfs/var
    - HOST_RUN=/hostfs/run

Steps to reproduce

Configuring mtr to trace three targets and emit each result as csv.
Start telegraf container using version 1.21.1 using the same environment as before (except user-value in docker-compose)
Log contains error entries

Expected behavior

Each CSV output is parsed properly
Header row of each CSV result of mtr is ignored and the custom CSV header mapping is used as in 1.20.4

Actual behavior

CSV is not parsed correctly, because the header row is only skipped for the first output of mtr. The header of the 2nd and 3rd output is processed.
Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "best" on measurement "mtr" is type string, already exists as type float dropped=3 is logged
The test run between 1.20.4 and 1.21.1 differs in containing the lines mtr,dest=Host,hop=Hop,host=mimas,ip=Ip avg="Avg",best="Best",loss="Loss%",snt="Snt",status="Status",stdev="StDev",worst="Wrst" 1640364294000000000 for the latter one.

Additional info

Since I did not find any release notes about a changed behavior, I assume that this is a regression.

Reading the release notes pointed me to a fix #9865 which is related to csv_skip_rows and csv_header_row_count.
@etycomputer: Might this behavior be related to your fix in any way or am I on the wrong track?

Any combination of csv_skip_rows = 0|1 and csv_header_row_count = 0|1 did not result in any success.

The text was updated successfully, but these errors were encountered:

jjh74 · 2021-12-26T15:09:06Z

I think problem is #9865: commit (db86904#diff-2c668efeba00e299cb699055ee85b8a114f60aae8d7684c4f4f70ac2bc1b70f4) changes skiprow loop from:

for i := 0; i < p.SkipRows; i++

to

for p.SkipRows > 0 {
...
p.SkipRows--

So on second input p.SkipRows is already zero. (Probaly similar problem with p.HeaderRowCount).

As a workaround you could try splitting [[inputs.exec]] into multiple inputs each with only one commands:

[[inputs.exec]]
  name_override = "mtr"
  interval = "60s"
  commands = [ "mtr -i3 -C exampleA.com" ]
  timeout = "55s"
  data_format = "csv"
  csv_skip_rows = 1
  csv_column_names = [ "", "", "status", "dest", "hop", "ip", "loss", "snt", "", "", "avg", "best", "worst", "stdev"]
  csv_tag_columns = ["dest", "hop", "ip"]

[[inputs.exec]]
  name_override = "mtr"
  interval = "60s"
  commands = [ "mtr -i3 -C 8.8.8.8" ]
  timeout = "55s"
  data_format = "csv"
  csv_skip_rows = 1
  csv_column_names = [ "", "", "status", "dest", "hop", "ip", "loss", "snt", "", "", "avg", "best", "worst", "stdev"]
  csv_tag_columns = ["dest", "hop", "ip"]
...

DennisBayer · 2021-12-27T10:44:31Z

Thanks for reaching out and digging in the changeset.

Yeah, seems like there is some hassle with the counter, which should not be decreased - at least not the original config value. Something like?!:

var loopSkipRowCounter = p.SkipRows
for loopSkipRowCounter > 0 {
...
loopSkipRowCounter--

Unfortunately, I'm not proficient enough in the go language nor in the project architecture to provide a patch.

Related to your workaround:

The debug run looks good, ...

telegraf@mimas:/$ telegraf --config /etc/telegraf/telegraf.conf --input-filter exec --test --debug
2021-12-27T09:52:08Z I! Starting Telegraf 1.21.1
2021-12-27T09:52:08Z I! Loaded inputs: exec (3x)
2021-12-27T09:52:08Z I! Loaded aggregators: 
2021-12-27T09:52:08Z I! Loaded processors: regex
2021-12-27T09:52:08Z I! W! Outputs are not used in testing mode!
2021-12-27T09:52:08Z I! Tags enabled: host=mimas
2021-12-27T09:52:08Z D! [agent] Initializing plugins
2021-12-27T09:52:08Z D! [agent] Starting service inputs

> mtr,dest=10.30.2.2,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.28,best=0.24,loss=0,snt=10i,status="OK",stdev=0.04,worst=0.39 1640598765000000000
[snip]
> mtr,dest=10.30.2.2,hop=13,hoppad=13,host=mimas,ip=10.30.2.2 avg=9.65,best=8.4,loss=0,snt=10i,status="OK",stdev=1.18,worst=12.46 1640598765000000000

> mtr,dest=8.8.8.8,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.73,best=0.24,loss=0,snt=10i,status="OK",stdev=1.46,worst=4.9 1640598766000000000
[snip]
> mtr,dest=8.8.8.8,hop=16,hoppad=16,host=mimas,ip=dns.google avg=24.04,best=16.52,loss=0,snt=10i,status="OK",stdev=7.71,worst=38.86 1640598766000000000

2021-12-27T09:52:46Z D! [agent] Stopping service inputs
2021-12-27T09:52:46Z D! [agent] Input channel closed
2021-12-27T09:52:46Z D! [agent] Processor channel closed
2021-12-27T09:52:46Z D! [agent] Stopped Successfully

> mtr,dest=exampleA.com,hop=1,hoppad=01,host=mimas,ip=192.168.178.4 avg=0.28,best=0.21,loss=60,snt=10i,status="OK",stdev=0.06,worst=0.33 1640598766000000000
[snip]
> mtr,dest=exampleA.com,hop=17,hoppad=17,host=mimas,ip=com.exampleA avg=26.96,best=21.72,loss=0,snt=10i,status="OK",stdev=4.14,worst=34.13 1640598766000000000
telegraf@mimas:/$

...but the log still complains about invalid values:

2021-12-27T09:54:39Z I! Starting Telegraf 1.21.1
2021-12-27T09:54:39Z I! Using config file: /etc/telegraf/telegraf.conf
2021-12-27T09:54:39Z I! Loaded inputs: cpu disk diskio docker exec (3x) kernel mem net ping processes sensors swap system
2021-12-27T09:54:39Z I! Loaded aggregators: 
2021-12-27T09:54:39Z I! Loaded processors: regex
2021-12-27T09:54:39Z I! Loaded outputs: influxdb_v2
2021-12-27T09:54:39Z I! Tags enabled: host=mimas
2021-12-27T09:54:39Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"mimas", Flush Interval:10s
2021-12-27T09:56:40Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "loss" on measurement "mtr" is type string, already exists as type float dropped=3
2021-12-27T09:57:40Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "loss" on measurement "mtr" is type string, already exists as type float dropped=3
2021-12-27T09:58:44Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "snt" on measurement "mtr" is type string, already exists as type integer dropped=3
2021-12-27T09:59:37Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "worst" on measurement "mtr" is type string, already exists as type float dropped=1
2021-12-27T09:59:51Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "worst" on measurement "mtr" is type string, already exists as type float dropped=2
2021-12-27T10:00:43Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "loss" on measurement "mtr" is type string, already exists as type float dropped=3
2021-12-27T10:01:47Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "best" on measurement "mtr" is type string, already exists as type float dropped=3

Are the outputs collected in some sort of cache and batch processed during the flush?

byte-opps · 2021-12-31T13:40:56Z

I have a simular issue with skip-headers in [[inputs.exec]] running a powershell script that spits out a two column csv with headers. The first loop is fine and writes to DB. In the next telegraf agent loop/execution the csv header is not skipped, and so the write to the database fails the data type check and no further data is recored.
Also this is when running as a windows service, the same result when running from powershell. --test and -- debug work as expected for the 1st agent loop but the --test only runs once, would be nice if the test went more times, as i think this would show the issues.
This is not a permissions issue as it is running once correctly and is running agian to get the CSV again which is then not pasrsed correctly, 2nd 3rd....agent loops.

I have tried all combinations of csv_header_row_count and csv_skip_rows and it doesn't seam to make any changes.

I have encluded the below that should be all you need to recreate the issue?

I have version Telegraf 1.21.1 (git: HEAD 7c9a9c1)

telegraf.conf

[[inputs.exec]]
  commands = ["powershell.exe c:/Program` files/influxdata/telegraf/get-memory-details-csv-bytes.ps1"]
  timeout = "2s"
  name_override = "proc_mem"
  data_format = "csv"
  csv_header_row_count = 1
  csv_column_names = ["proc_name", "value"]
  csv_column_types = ["string", "int"]
  csv_skip_rows = 0
  csv_tag_columns = ["proc_name"]

get-memory-details-csv-bytes.ps1

 get-process | Group-Object -Property ProcessName | Select-Object -Property Name,@{Name='WorkingSet'; Expression={(($_.Group|Measure-Object WorkingSet64 -Sum).Sum)}} | ConvertTo-Csv -NoTypeInformation

ERROR msg:
2021-12-31T13:34:50Z E! [inputs.exec] Error in plugin: column type: parse int error strconv.ParseInt: parsing "WorkingSet": invalid syntax

where 'WorkingSet' is the CSV column header from the powershell CSV

CSV example:

"Name","WorkingSet"
"ALsvc","5169152"
"ApplicationFrameHost","26857472"
"conhost","193732608"
...

jjh74 · 2022-01-10T11:04:28Z

Are the outputs collected in some sort of cache and batch processed during the flush?

I don't know, but I'm guessing that the parser is initialized once on startup and after first use the counters become 0 and after that cvs header is not skipped.

If I read the comments on #9865 correctly skipping csv header only once is ok/desired for example with inputs.tail but for others like inputs.exec header should be skipped for every interval/input.
(@powersj / @srebhan: Can you take a look. I think #9865 (see comment #10347 (comment)) broke csv parser for inputs.exec)

(I think as a workaround you can use starlark to drop the incorrect csv header metric:

[[processors.starlark]]
  namepass = [ "mtr" ]
  source = '''
def apply(metric):
   if metric.fields["best"] != "Best":
       return metric
   else:
       return None
'''

)

etycomputer · 2022-01-10T17:49:50Z

Hi @jjh74, I think the issue is that the inputs.exec is not creating a new parser object for each execution.
Currently, there is no functionality to reset the parser to its original state.
I am happy to work on this issue but I need feedback on how to best tackle this issue.

sjwang90 · 2022-01-11T16:31:52Z

Hi @DennisBayer @byte-opps @jjh74 @etycomputer - Would you be willing to test the artifacts on this parser restructuring PR here: #8791 (comment). This PR should address csv_skip_rows issues. Any feedback please leave on PR #8791. Thank you!

Update: artificats rebuilt as of Jan-12-2022

sjwang90 · 2022-01-13T20:31:53Z

Closed by #8791

DennisBayer added the bug unexpected problem or unintended behavior label Dec 24, 2021

telegraf-tiger bot added the area/exec label Dec 24, 2021

etycomputer mentioned this issue Jan 12, 2022

feat: Parser plugin restructuring #8791

Merged

2 tasks

sjwang90 closed this as completed Jan 13, 2022

NielsH mentioned this issue Feb 18, 2022

CSV parser does not handle csv_header_row_count correctly #10678

Closed

kurtbeil01 mentioned this issue Jul 20, 2022

parse float error strconv in inputs.exec #11531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: CSV parser does not handle csv_skip_rows correctly #10347

Regression: CSV parser does not handle csv_skip_rows correctly #10347

DennisBayer commented Dec 24, 2021

jjh74 commented Dec 26, 2021

DennisBayer commented Dec 27, 2021 •

edited

Loading

byte-opps commented Dec 31, 2021 •

edited

Loading

jjh74 commented Jan 10, 2022

etycomputer commented Jan 10, 2022

sjwang90 commented Jan 11, 2022 •

edited

Loading

sjwang90 commented Jan 13, 2022

Regression: CSV parser does not handle csv_skip_rows correctly #10347

Regression: CSV parser does not handle csv_skip_rows correctly #10347

Comments

DennisBayer commented Dec 24, 2021

Relevent telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

jjh74 commented Dec 26, 2021

DennisBayer commented Dec 27, 2021 • edited Loading

byte-opps commented Dec 31, 2021 • edited Loading

jjh74 commented Jan 10, 2022

etycomputer commented Jan 10, 2022

sjwang90 commented Jan 11, 2022 • edited Loading

sjwang90 commented Jan 13, 2022

DennisBayer commented Dec 27, 2021 •

edited

Loading

byte-opps commented Dec 31, 2021 •

edited

Loading

sjwang90 commented Jan 11, 2022 •

edited

Loading