Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a crash in test_message_cache #635

Merged
merged 1 commit into from
Feb 11, 2021

Conversation

clalancette
Copy link
Contributor

@clalancette clalancette commented Feb 4, 2021

Fix a crash in test_message_cache.

The cache consumer requires that you call close()
before calling change_consumer_callback(). The test
forgot the close() call, so was running into a race
where changing out the callback was not atomic.
Remember to call close() in the test, which should fix
some occasional failures in CI.

Signed-off-by: Chris Lalancette clalancette@openrobotics.org

This should fix the occasional crashes we are seeing, like https://ci.ros2.org/view/nightly/job/nightly_osx_repeated/2240/testReport/junit/(root)/projectroot/test_message_cache/ .

While this change makes sense to me, and passes the unit tests, I am hesitant just because the previous code was so weird. I'm not sure if I'm missing something. @adamdbrw , @Karsten1987 , @mjeronimo , @emersonknapp review here is welcome.

@clalancette clalancette requested a review from a team as a code owner February 4, 2021 15:52
@clalancette clalancette requested review from mjeronimo and jaisontj and removed request for a team February 4, 2021 15:52
@adamdbrw
Copy link
Collaborator

adamdbrw commented Feb 4, 2021

Analyzing the code I can say that this only occurs in the test itself due to the fact that the close() is not called as it should before the change of callback.

I am not sure about the change - I will take some time to analyze it. If it works properly, I am for it since it takes away the requirement of close().

If it affects performance considerably, I would opt to change the test instead and think about how we can guarantee that the close was called.

Note that the actual code of caching in rosbag2 was tested to the extreme in our benchmarks (including plenty of splitting and huge volumes of data and/or topics) and we had no crashes. Which doesn't necessarily make them impossible, but certainly much less likely.

@adamdbrw adamdbrw changed the title Fix a race and a leak in the cache consumer. Fix a race and a leak in the cache consumer test. Feb 4, 2021
@adamdbrw adamdbrw changed the title Fix a race and a leak in the cache consumer test. Fix a race and a leak in the cache consumer. Feb 4, 2021
The cache consumer requires that you call close()
before calling change_consumer_callback().  The test
forgot the close() call, so was running into a race
where changing out the callback was not atomic.
Remember to call close() in the test, which should fix
some occasional failures in CI.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@clalancette clalancette force-pushed the clalancette/fix-cache-consumer-race branch from 6e137ce to c77539f Compare February 9, 2021 01:46
@clalancette clalancette changed the title Fix a race and a leak in the cache consumer. Fix a crash in test_message_cache Feb 9, 2021
@adamdbrw
Copy link
Collaborator

This looks good to me, thank you Chris! @mjeronimo could you take a look and merge this? It fixes a non-deterministic test crash.

@mjeronimo
Copy link
Contributor

@ros-pull-request-builder retest this please

@clalancette
Copy link
Contributor Author

The Rpr job is failing, but that doesn't seem unique to this PR; other PRs seem to have the same problem. Probably something to be looked into, but I don't think it should hold this one up. I'm going to run CI on this just to ensure the other platforms are happy as well:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@mjeronimo mjeronimo merged commit 5d9afe6 into master Feb 11, 2021
@delete-merged-branch delete-merged-branch bot deleted the clalancette/fix-cache-consumer-race branch February 11, 2021 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants