Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workers, isBlocking and queue size to listen config #208

Merged
merged 3 commits into from
Sep 1, 2023

Conversation

mieczkowski
Copy link
Contributor

I was testing flows collecting from pmacct and goflow2 and realized that goflow2 is loosing packets (goflow2 should have MORE data, because pmacct is aggregating in memory and flush aggregated)

SELECT *
FROM collector_compare

Query id: 4d8b440f-06ff-4ff8-b732-8292b5b16b44

┌─type───────┬─pmacct_count────┬─goflow_count────┬─diff───────────┬─percent─┐
│ IPFIX      │ 199.16 thousand │ 190.66 thousand │ -8.50 thousand │ -4.3%   │
│ SFLOW_5    │ 33.23 million   │ 24.47 million   │ -8.75 million  │ -26.3%  │
│ NETFLOW_V5 │ 202.00203.001.000.5%    │
└────────────┴─────────────────┴─────────────────┴────────────────┴─────────┘

After debugging I realized that when we have 1 socket = 1 worker -> with high volume of flows per second reading packet will be faster than processing (and dispatch channel will be growing), so I propose:

  1. Add config options to listening url with handling defaults (if blocking=0 queue=1000000 else queue=0 , and numWorkers = sockets*2)
  2. In udp.go increase default workers count to cfg.Sockets * 2

After changes:

┌─type───────┬─pmacct_count───┬─goflow_count───┬─diff───────────┬─percent─┐
│ IPFIX      │ 10.16 million  │ 10.16 million  │ 274.000%      │
│ SFLOW_5    │ 345.81 million │ 484.31 million │ 138.50 million │ 28.6%   │
│ NETFLOW_V5 │ 9.21 thousand  │ 9.20 thousand  │ -7.00-0.1%   │
└────────────┴────────────────┴────────────────┴────────────────┴─────────┘

cmd/goflow2/main.go Outdated Show resolved Hide resolved
@lspgn
Copy link
Member

lspgn commented Aug 27, 2023

Hi @mieczkowski ,
Thank you for the performance comparison.
Could you also provide information about the setup (CPU/cores/arch, live traffic/simulator, pps)?
After the changes, is there an explanation for having SFLOW_5 be +28%?

utils/udp.go Outdated Show resolved Hide resolved
@lspgn
Copy link
Member

lspgn commented Aug 27, 2023

I may refactor a bit this PR, I think adding configuration options like workers is good but I don't want the program to take too many assumptions

@mieczkowski
Copy link
Contributor Author

Could you also provide information about the setup (CPU/cores/arch, live traffic/simulator, pps)?

Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz, linux debian, 8 cores. Real flows from some production environments. 20-25k flows per second.

┌────────────datetime─┬─count()─┐
│ 2023-08-26 17:00:0921774 │
│ 2023-08-26 17:00:0824071 │
│ 2023-08-26 17:00:1020860 │
│ 2023-08-26 17:00:0421283 │
│ 2023-08-26 17:00:0319179 │
│ 2023-08-26 17:00:0220249 │
│ 2023-08-26 17:00:0520164 │
│ 2023-08-26 17:00:0120832 │
│ 2023-08-26 17:00:0620018 │
│ 2023-08-26 17:00:0719880 │
│ 2023-08-26 17:00:0022058 │
└─────────────────────┴─────────┘

After the changes, is there an explanation for having SFLOW_5 be +28%?

pmacct is aggregating flows, so for example if in one second there will be 5 flows with identical params, pmacct will group them and put 5 in flows key, and sum bytes+packets. Goflow2 sends raw flows, so have more records in DB.

I may refactor a bit this PR

I have nothing against :) Maybe good idea would be some "dynamic workers" mechanism. If dispatch channel is growing - run more workers, and clean them if traffic is falling.

@lspgn
Copy link
Member

lspgn commented Aug 27, 2023

Maybe good idea would be some "dynamic workers" mechanism. If dispatch channel is growing - run more workers, and clean them if traffic is falling.

It's a good idea but I'm not sure if the added complexity is worth it: flows will have spikes periodically (unless ddos) and I don't think idle workers consume that much? since this is only a single "task" (decoding flows) and it will have to compete against other applications without being able to kick them out. I would rather suggest fixed capacity allocation. Overall I think it's more for platforms like Kubernetes to guarantee a minimum resource, autoscale or reschedule elsewhere.

@mieczkowski
Copy link
Contributor Author

I would also suggest moving initialization of metrics to separate package. When I wanted to add metric for dropping packets (in udp.go) there was a import cycle (metrics import utils).

@lspgn
Copy link
Member

lspgn commented Aug 28, 2023

moving initialization of metrics to separate package

there is https://github.com/netsampler/goflow2/blob/main/metrics/metrics.go
but I designed in such a way it should be callbacks/wrappers called from the main app

@mieczkowski
Copy link
Contributor Author

But there is an import of utils package: https://github.com/netsampler/goflow2/blob/main/metrics/decoder.go#L9
So you can't use metrics in utils pkg (where should be counter for dropped packets)

@lspgn
Copy link
Member

lspgn commented Aug 28, 2023

So you can't use metrics in utils pkg (where should be counter for dropped packets)

Correct, I wanted to separate metrics from processing/decoding: for instance, if someone imports the package, it won't impose Prometheus metrics.
The GoFlow2 app has decodeFunc callback which is wrapped by metrics.
But I agree there is a need to count the number of dropped packets.

@lspgn lspgn added the enhancement New feature or request label Sep 1, 2023
@lspgn lspgn merged commit edc306c into netsampler:main Sep 1, 2023
@mieczkowski mieczkowski deleted the additional_config branch March 14, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants