Force rclrs_tests
to use the UDPv4 transport instead of the default shared memory.
#364
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A few weeks ago, @luca-della-vedova pointed out that we had some occasional segfaults when running our unit tests in CI. I looked and sure enough we've pretty much always had some flaky tests for as long as this repo existed. Although the logs for these old runs aren't kept by Github, we can reasonably infer that they've been present since the beginning or near there.
A good place to start looking when you've got flakey tests, is to see if you're handling memory properly. I compiled
rclrs_tests
and ran them with valgrind.Run Command
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose --log-file=valgrind-out.txt ./target/debug/deps/rclrs_tests-<hash>
The main takeaway here is that
graph_tests.rs
is using DDS (via rmw) in the test. This can be problematic (especially on CI) because by default, most DDS implementations default to shared memory, and well, shared memory segments aren't exactly process isolated. If another DDS process is running on the machine you run the risk of shared memory segments not being cleaned up properly which can (and in this case did) interfere with unit testing [1*].I have attached the output of 3 separate valgrind logs, and pasted their summary below each one
main
As you can see, we are 100% leaking some memory. If we look through the logs for the entries...
Its pretty clear it comes from FastDDS itself (Note the
librmw_fastrtps_cpp.so
references).main
withgraph_tests.rs
commented outThese definitive and indirect leaks are not present in the valgrind log where we force the transport layer to be UDPv4 or comment it out completely.
So, in conclusion we can at least "fix" an obvious memory leak in rclrs_tests causing occasional build failures by forcing UDPv4 instead of shmem. This "fix" is of course contingent on ROS 2 defaulting to FastDDS (which it is thankfully for Humble, Iron, and for now, Rolling). I really wish there was a way to do this via the rmw API, but I am not aware of a way to do that.
Another option we have is to simply remove this test. I have seen relying on DDS in tests to be problematic and I would be happy to move away from integration tests masquerading as unit tests.
Note 1* Is from RTI (not FastDDS), but I believe this is a fundamental property of multiple processes using shared memory