Add workers, isBlocking and queue size to listen config #208

mieczkowski · 2023-08-24T09:05:51Z

I was testing flows collecting from pmacct and goflow2 and realized that goflow2 is loosing packets (goflow2 should have MORE data, because pmacct is aggregating in memory and flush aggregated)

SELECT *
FROM collector_compare

Query id: 4d8b440f-06ff-4ff8-b732-8292b5b16b44

┌─type───────┬─pmacct_count────┬─goflow_count────┬─diff───────────┬─percent─┐
│ IPFIX      │ 199.16 thousand │ 190.66 thousand │ -8.50 thousand │ -4.3%   │
│ SFLOW_5    │ 33.23 million   │ 24.47 million   │ -8.75 million  │ -26.3%  │
│ NETFLOW_V5 │ 202.00          │ 203.00          │ 1.00           │ 0.5%    │
└────────────┴─────────────────┴─────────────────┴────────────────┴─────────┘

After debugging I realized that when we have 1 socket = 1 worker -> with high volume of flows per second reading packet will be faster than processing (and dispatch channel will be growing), so I propose:

Add config options to listening url with handling defaults (if blocking=0 queue=1000000 else queue=0 , and numWorkers = sockets*2)
In udp.go increase default workers count to cfg.Sockets * 2

After changes:

┌─type───────┬─pmacct_count───┬─goflow_count───┬─diff───────────┬─percent─┐
│ IPFIX      │ 10.16 million  │ 10.16 million  │ 274.00         │ 0%      │
│ SFLOW_5    │ 345.81 million │ 484.31 million │ 138.50 million │ 28.6%   │
│ NETFLOW_V5 │ 9.21 thousand  │ 9.20 thousand  │ -7.00          │ -0.1%   │
└────────────┴────────────────┴────────────────┴────────────────┴─────────┘

cmd/goflow2/main.go

lspgn · 2023-08-27T20:59:25Z

Hi @mieczkowski ,
Thank you for the performance comparison.
Could you also provide information about the setup (CPU/cores/arch, live traffic/simulator, pps)?
After the changes, is there an explanation for having SFLOW_5 be +28%?

utils/udp.go

lspgn · 2023-08-27T21:07:11Z

I may refactor a bit this PR, I think adding configuration options like workers is good but I don't want the program to take too many assumptions

mieczkowski · 2023-08-27T21:18:37Z

Could you also provide information about the setup (CPU/cores/arch, live traffic/simulator, pps)?

Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz, linux debian, 8 cores. Real flows from some production environments. 20-25k flows per second.

┌────────────datetime─┬─count()─┐
│ 2023-08-26 17:00:09 │   21774 │
│ 2023-08-26 17:00:08 │   24071 │
│ 2023-08-26 17:00:10 │   20860 │
│ 2023-08-26 17:00:04 │   21283 │
│ 2023-08-26 17:00:03 │   19179 │
│ 2023-08-26 17:00:02 │   20249 │
│ 2023-08-26 17:00:05 │   20164 │
│ 2023-08-26 17:00:01 │   20832 │
│ 2023-08-26 17:00:06 │   20018 │
│ 2023-08-26 17:00:07 │   19880 │
│ 2023-08-26 17:00:00 │   22058 │
└─────────────────────┴─────────┘

After the changes, is there an explanation for having SFLOW_5 be +28%?

pmacct is aggregating flows, so for example if in one second there will be 5 flows with identical params, pmacct will group them and put 5 in flows key, and sum bytes+packets. Goflow2 sends raw flows, so have more records in DB.

I may refactor a bit this PR

I have nothing against :) Maybe good idea would be some "dynamic workers" mechanism. If dispatch channel is growing - run more workers, and clean them if traffic is falling.

lspgn · 2023-08-27T21:37:00Z

Maybe good idea would be some "dynamic workers" mechanism. If dispatch channel is growing - run more workers, and clean them if traffic is falling.

It's a good idea but I'm not sure if the added complexity is worth it: flows will have spikes periodically (unless ddos) and I don't think idle workers consume that much? since this is only a single "task" (decoding flows) and it will have to compete against other applications without being able to kick them out. I would rather suggest fixed capacity allocation. Overall I think it's more for platforms like Kubernetes to guarantee a minimum resource, autoscale or reschedule elsewhere.

mieczkowski · 2023-08-27T21:46:31Z

I would also suggest moving initialization of metrics to separate package. When I wanted to add metric for dropping packets (in udp.go) there was a import cycle (metrics import utils).

lspgn · 2023-08-28T15:44:58Z

moving initialization of metrics to separate package

there is https://github.com/netsampler/goflow2/blob/main/metrics/metrics.go
but I designed in such a way it should be callbacks/wrappers called from the main app

mieczkowski · 2023-08-28T15:48:20Z

But there is an import of utils package: https://github.com/netsampler/goflow2/blob/main/metrics/decoder.go#L9
So you can't use metrics in utils pkg (where should be counter for dropped packets)

lspgn · 2023-08-28T15:54:58Z

So you can't use metrics in utils pkg (where should be counter for dropped packets)

Correct, I wanted to separate metrics from processing/decoding: for instance, if someone imports the package, it won't impose Prometheus metrics.
The GoFlow2 app has decodeFunc callback which is wrapped by metrics.
But I agree there is a need to count the number of dropped packets.

Add workers, isBlocking and queue size to listen config

1893343

lspgn reviewed Aug 27, 2023

View reviewed changes

cmd/goflow2/main.go Outdated Show resolved Hide resolved

lspgn reviewed Aug 27, 2023

View reviewed changes

utils/udp.go Outdated Show resolved Hide resolved

Paweł Mieczkowski added 2 commits August 27, 2023 23:21

revert default Workers num as sockets*2

899b2df

review changes

0f58700

lspgn added the enhancement New feature or request label Sep 1, 2023

lspgn merged commit edc306c into netsampler:main Sep 1, 2023

mieczkowski deleted the additional_config branch March 14, 2024 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workers, isBlocking and queue size to listen config #208

Add workers, isBlocking and queue size to listen config #208

mieczkowski commented Aug 24, 2023

lspgn commented Aug 27, 2023

lspgn commented Aug 27, 2023

mieczkowski commented Aug 27, 2023

lspgn commented Aug 27, 2023

mieczkowski commented Aug 27, 2023

lspgn commented Aug 28, 2023

mieczkowski commented Aug 28, 2023

lspgn commented Aug 28, 2023

Add workers, isBlocking and queue size to listen config #208

Add workers, isBlocking and queue size to listen config #208

Conversation

mieczkowski commented Aug 24, 2023

lspgn commented Aug 27, 2023

lspgn commented Aug 27, 2023

mieczkowski commented Aug 27, 2023

lspgn commented Aug 27, 2023

mieczkowski commented Aug 27, 2023

lspgn commented Aug 28, 2023

mieczkowski commented Aug 28, 2023

lspgn commented Aug 28, 2023