Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphite output mangling '%' character #1377

Closed
G-regL opened this issue Jun 16, 2016 · 0 comments · Fixed by #1392
Closed

Graphite output mangling '%' character #1377

G-regL opened this issue Jun 16, 2016 · 0 comments · Fixed by #1392
Labels
bug unexpected problem or unintended behavior

Comments

@G-regL
Copy link
Contributor

G-regL commented Jun 16, 2016

Bug report

As a follow-up to #1118, it appears that the %' symbol is getting mangled as %!_(MISSING) when sent to Graphite.
Outputting the same metrics to a file, and pushing that file to Graphite works properly.

Relevant telegraf.conf:

[global_tags]

[agent]
  interval = "30s"
  round_interval = true
  metric_buffer_limit = 1000
  flush_buffer_when_full = true
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  quiet = false
  hostname = "test"


###############################################################################
#                                  OUTPUTS                                    #
###############################################################################
[[outputs.graphite]]
  servers = ["graphite.example.com:2003"]
  prefix = ""
  template = "host.measurement.tags.field"
  timeout = 2

[[outputs.file]]
  files = [ "stdout", "C:/telegraf-1.0.0-beta1_windows_amd64/telegraf/stats.txt" ]
  data_format = "graphite"
  template = "host.measurement.tags.field"

###############################################################################
#                                  INPUTS                                     #
###############################################################################
[[inputs.win_perf_counters]]
  [[inputs.win_perf_counters.object]]
    ObjectName = "Processor"
    Instances = ["_Total"]
    Counters = ["% Idle Time", "% Interrupt Time", "% Privileged Time", "% User Time", "% Processor Time"]
    Measurement = "win_cpu"

  [[inputs.win_perf_counters.object]]
    ObjectName = "LogicalDisk"
    Instances = ["*"]
    Counters = ["% Idle Time", "% Disk Time","% Disk Read Time", "% Disk Write Time", "% User Time", "Current Disk Queue Length", "Disk Bytes/Sec", "Disk Read Bytes/sec", "Disk Reads/Sec", "Disk Transfers/Sec", "Disk Write Bytes/sec", "Disk Writes/sec" ]
    Measurement = "win_disk"
    IncludeTotal=true  #Set to true to include _Total instance when querying for all (*).

  [[inputs.win_perf_counters.object]]
    ObjectName = "System"
    Counters = ["Context Switches/sec", "System Calls/sec", "Threads", "System Up Time", "Processes", "Processor Queue Length"]
    Instances = ["------"]
    Measurement = "win_system"

  [[inputs.win_perf_counters.object]]
    ObjectName = "Memory"
    Counters = ["Available Bytes","Committed Bytes", "Cache Faults/sec","Demand Zero Faults/sec","Page Faults/sec","Pages/sec","Transition Faults/sec","Pool Nonpaged Bytes","Pool Paged Bytes"]
    Instances = ["------"] # Use 6 x - to remove the Instance bit from the query.
    Measurement = "win_mem"

System info:

This affects Windows Telegraf clients, versions 0.12.1, 0.13.0 and 1.0.0-beta1, sending to Graphite.

Steps to reproduce:

  1. Run config above on a Windows system and send output to Graphite
  2. View improperly named Whisper files in logs and on disk

Expected behavior:

On the Telegraf end, this is what's in the output file:

test.win_mem.Memory.Committed_Bytes 9.826623e+09 1466090070
test.win_mem.Memory.Pool_Paged_Bytes 7.4837606e+08 1466090070
test.win_cpu._Total.Processor.%_Interrupt_Time 0 1466090070
test.win_cpu._Total.Processor.%_Privileged_Time 0.16453928 1466090070
test.win_cpu._Total.Processor.%_Processor_Time 0.49710476 1466090070
test.win_disk.HarddiskVolume1.LogicalDisk.%_Disk_Write_Time 0 1466090070
test.win_disk.C:.LogicalDisk.%_Disk_Write_Time 0.5937533 1466090070
test.win_disk.J:.LogicalDisk.%_Disk_Write_Time 0.034992356 1466090070
test.win_disk.E:.LogicalDisk.%_Disk_Write_Time 0 1466090070
test.win_disk._Total.LogicalDisk.%_Disk_Write_Time 0.15718642 1466090070
test.win_system.System.Context_Switches_persec 8021.2686 1466090070

And this gets sent over the wire to Graphite (viewed with tcpdump on the Graphite server)

test.win_mem.Memory.Committed_Bytes 9.826623e+09 1466090070
test.win_mem.Memory.Pool_Paged_Bytes 7.4837606e+08 1466090070
test.win_cpu._Total.Processor.%_Interrupt_Time 0 1466090070
test.win_cpu._Total.Processor.%_Privileged_Time 0.16453928 1466090070
test.win_cpu._Total.Processor.%_Processor_Time 0.49710476 1466090070
test.win_disk.HarddiskVolume1.LogicalDisk.%_Disk_Write_Time 0 1466090070
test.win_disk.C:.LogicalDisk.%_Disk_Write_Time 0.5937533 1466090070
test.win_disk.J:.LogicalDisk.%_Disk_Write_Time 0.034992356 1466090070
test.win_disk.E:.LogicalDisk.%_Disk_Write_Time 0 1466090070
test.win_disk._Total.LogicalDisk.%_Disk_Write_Time 0.15718642 1466090070
test.win_system.System.Context_Switches_persec 8021.2686 1466090070

Actual behavior:

On the Telegraf end, this is what's in the output file:

test.win_mem.Memory.Committed_Bytes 9.826623e+09 1466090070
test.win_mem.Memory.Pool_Paged_Bytes 7.4837606e+08 1466090070
test.win_cpu._Total.Processor.%_Interrupt_Time 0 1466090070
test.win_cpu._Total.Processor.%_Privileged_Time 0.16453928 1466090070
test.win_cpu._Total.Processor.%_Processor_Time 0.49710476 1466090070
test.win_disk.HarddiskVolume1.LogicalDisk.%_Disk_Write_Time 0 1466090070
test.win_disk.C:.LogicalDisk.%_Disk_Write_Time 0.5937533 1466090070
test.win_disk.J:.LogicalDisk.%_Disk_Write_Time 0.034992356 1466090070
test.win_disk.E:.LogicalDisk.%_Disk_Write_Time 0 1466090070
test.win_disk._Total.LogicalDisk.%_Disk_Write_Time 0.15718642 1466090070
test.win_system.System.Context_Switches_persec 8021.2686 1466090070

But this ends up getting sent over the wire to Graphite (viewed with tcpdump on the Graphite server).

test.win_mem.Memory.Committed_Bytes 9.826623e+09 1466090070
test.win_mem.Memory.Pool_Paged_Bytes 7.4837606e+08 1466090070
test.win_cpu._Total.Processor.%!_(MISSING)Interrupt_Time 0 1466090070
test.win_cpu._Total.Processor.%!_(MISSING)Privileged_Time 0.16453928 1466090070
test.win_cpu._Total.Processor.%!_(MISSING)Processor_Time 0.49710476 1466090070
test.win_disk.HarddiskVolume1.LogicalDisk.%!_(MISSING)Disk_Write_Time 0 1466090070
test.win_disk.C:.LogicalDisk.%!_(MISSING)Disk_Write_Time 0.5937533 1466090070
test.win_disk.J:.LogicalDisk.%!_(MISSING)Disk_Write_Time 0.034992356 1466090070
test.win_disk.E:.LogicalDisk.%!_(MISSING)Disk_Write_Time 0 1466090070
test.win_disk._Total.LogicalDisk.%!_(MISSING)Disk_Write_Time 0.15718642 1466090070
test.win_system.System.Context_Switches_persec 8021.2686 1466090070

Here's the directory listing of some relevant Whisper files, showing that Graphite is writing them with the mangled names.

$ ls -1 /opt/graphite/storage/whisper/test/win_cpu/_Total/Processor/
%!_(MISSING)Idle_Time.wsp
%!_(MISSING)Interrupt_Time.wsp
%!_(MISSING)Privileged_Time.wsp
%!_(MISSING)Processor_Time.wsp
%!_(MISSING)User_Time.wsp

Additional Information

If you take the file output from the client end, and nc it into Graphite, everything is peachy, so clearly Graphite does support the %, it's just that it's getting mangled somewhere in Telegraf.

As mentioned at the top of the Issue, this is related to #1118, and it should be possible to simply replace % with percent or some other value, but that's just a band-aid. The problem actually likely lies in the Graphite Output, and it's possible that this bug would bite any input that could generate field names with a % in it.

@sparrc sparrc added the bug unexpected problem or unintended behavior label Jun 16, 2016
sparrc added a commit that referenced this issue Jun 21, 2016
chebrolus pushed a commit to chebrolus/telegraf that referenced this issue Jun 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants