Skip to content

Conversation

@JoukoVirtanen
Copy link
Contributor

@JoukoVirtanen JoukoVirtanen commented Nov 2, 2025

Description

The TestRepeatedNetworkFlowWithZeroAfterglowPeriod fails frequently with errors such as

=== RUN   TestRepeatedNetworkFlowWithZeroAfterglowPeriod/TestRepeatedNetworkFlow
    expect_conn.go:72: 
        	Error Trace:	/home/runner/work/collector/collector/integration-tests/pkg/mock_sensor/expect_conn.go:72
        	            				/home/runner/work/collector/collector/integration-tests/suites/repeated_network_flow.go:114
        	Error:      	timed out
        	Test:       	TestRepeatedNetworkFlowWithZeroAfterglowPeriod/TestRepeatedNetworkFlow
        	Messages:   	found 4 connections (expected 3)

This test is currently too strict since the number of observed active connections cannot be guaranteed. While the connection is short lived it is still finite and not instantaneous. Therefore it is possible that the connection will be active during a scrape. This PR checks that the number of active connections observed is in an expected range, whereas currently there is an assert for a specific number of observed active connections.

If the error is that there are four close events reported for the connection, this test will still fail. It is possible that there is a race condition between getting the connection from procfs and syscalls. In one scrape interval the connection might be reported closed because it is obtained from a syscall and in the next scrape interval it might be reported closed, because it is obtained from procfs. The changes here don't fix that issue.

Future work includes tests that use procfs scrape only and syscalls only to get networking events, so that we know that both work independently https://issues.redhat.com/browse/ROX-31753

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added unit tests
  • Added integration tests
  • Added regression tests

If any of these don't apply, please comment below.

Testing Performed

Ran the test locally.

@JoukoVirtanen JoukoVirtanen requested a review from a team as a code owner November 2, 2025 02:02
// connections will be seen by the test.
ExpectedMinActive: 0,
ExpectedMaxActive: 2,
ExpectedInactive: 3,
Copy link
Contributor Author

@JoukoVirtanen JoukoVirtanen Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that only cases with 4 connections have been observed, but 5 connections are possible. While I believe that the changes in #2641 are an improvement, they increase the chances of seeing 5 connections. That is because the changes made there make it more likely that if the connection is active at a scrape at t=0, it will be active at t=6, since the connections will actually be spaced 2 seconds apart rather than a bit more than 2 seconds.

@codecov-commenter
Copy link

codecov-commenter commented Nov 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 27.60%. Comparing base (55be868) to head (2b28cd8).
⚠️ Report is 10 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2642   +/-   ##
=======================================
  Coverage   27.60%   27.60%           
=======================================
  Files          95       95           
  Lines        5422     5422           
  Branches     2523     2523           
=======================================
  Hits         1497     1497           
  Misses       3213     3213           
  Partials      712      712           
Flag Coverage Δ
collector-unit-tests 27.60% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JoukoVirtanen JoukoVirtanen changed the title Fix TestRepeatedNetworkFlowWithZeroAfterglowPeriod TestRepeatedNetworkFlow checks that number of active connections is in expected range Nov 4, 2025
Copy link
Contributor

@erthalion erthalion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth adding to the commentaries, that the reason for this uncertainty is that we have information from both scraper and signals at the same time, and leave a TODO to add more tests for scraper/signals only without such uncertainty.

@JoukoVirtanen
Copy link
Contributor Author

I think it's worth adding to the commentaries, that the reason for this uncertainty is that we have information from both scraper and signals at the same time, and leave a TODO to add more tests for scraper/signals only without such uncertainty.

I have created a ticket for integration tests that use procfs only or syscalls only as the source of networking events. https://issues.redhat.com/browse/ROX-31753

The ticket explains why this is needed. A TODO with the ticket and this information has been added to integration_test.go. This information has also been added to the PR description.

@JoukoVirtanen JoukoVirtanen merged commit abacfb7 into master Nov 13, 2025
70 checks passed
@JoukoVirtanen JoukoVirtanen deleted the jv-fix-TestRepeatedNetworkFlowWithZeroAfterglowPeriod branch November 13, 2025 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants