Description
using client_golang v1.17.0
This is probably one of the strangest things I've ever encountered, and I struggled quite a bit trying to reproduce it. Under some workloads, we were seeing the flush in a summary metric consume >60% of total program CPU time, which is obviously insane. With slightly different workloads, though, we didn't see maybe 40% or 50%; we saw so little CPU that we had to search for it in the profile to see it at all.
I've put up a small test program on the go playground: https://go.dev/play/p/VUpA2pLNQtS
To summarize:
- If I write a few million arbitrary values to a summary per second, everything is fine.
- If I write a few million zeros to a summary per second, everything is fine.
- If I write a few million values NEARLY all of which are zeros to a summary, performance goes completely pathological.
So, for instance, with the provided test program, on my laptop:
$ ./summary -procs 8 -first -value 2 -rate 2080000 -secs 10
8 procs, 2080000 samples per second for 10 secs, 2080/ms, 260/proc/ms
1000 rounds, 999.968101ms elapsed
2000 rounds, 1.000117226s elapsed
3000 rounds, 999.884346ms elapsed
4000 rounds, 1.000144866s elapsed
5000 rounds, 999.953162ms elapsed
6000 rounds, 999.951624ms elapsed
7000 rounds, 999.975637ms elapsed
8000 rounds, 1.000083589s elapsed
9000 rounds, 999.938046ms elapsed
9.999977264s elapsed
0:$ ./summary -procs 8 -first -value 0 -rate 2080000 -secs 10
8 procs, 2080000 samples per second for 10 secs, 2080/ms, 260/proc/ms
1000 rounds, 1.000091221s elapsed
2000 rounds, 1.028908993s elapsed
3000 rounds, 1.373470493s elapsed
4000 rounds, 1.997987326s elapsed
5000 rounds, 3.407858654s elapsed
6000 rounds, 4.003588481s elapsed
7000 rounds, 4.748205114s elapsed
8000 rounds, 5.465365788s elapsed
9000 rounds, 6.129958022s elapsed
35.764543742s elapsed
0:$ ./summary -procs 8 -value 0 -rate 2080000 -secs 10
8 procs, 2080000 samples per second for 10 secs, 2080/ms, 260/proc/ms
1000 rounds, 1.000075348s elapsed
2000 rounds, 999.975518ms elapsed
3000 rounds, 1.00003006s elapsed
4000 rounds, 999.994831ms elapsed
5000 rounds, 1.000009216s elapsed
6000 rounds, 999.987056ms elapsed
7000 rounds, 999.998651ms elapsed
8000 rounds, 1.0000061s elapsed
9000 rounds, 999.992415ms elapsed
10.000071504s elapsed
It's particularly notable to me that it's specific to the value 0. If we write occasional 1
in a stream that's otherwise all 2
, that's fine, but if the stream is otherwise all 0
, we get this pathological behavior.
I doubt that you actually need the 2M values a second to trigger this at all, but it's certainly easier to see at that point.
We're not in any urgent need of a fix, since in the actual use case in question, the events that were generating the flood of 0 metrics were amenable to "well don't do that then" as a solution, but I feel like "there's a stream of data of which a whole lot is 0s but not quite all" must be a case that happens occasionally. (This might be a partial explanation for the other issue I see suggesting a linked list instead of a slice, since that looks like about where in the code a profile showed this acting up for me, too.)