Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected new stack trace in rclpy #322

Closed
pbaughman opened this issue Apr 26, 2019 · 2 comments · Fixed by #323
Closed

Unexpected new stack trace in rclpy #322

pbaughman opened this issue Apr 26, 2019 · 2 comments · Fixed by #323
Assignees

Comments

@pbaughman
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 16.04 - running nightly docker image
  • Installation type:
    • binaries
  • Version or commit hash:
    • Using docker image sha256:14cb0769c64b280b5c609a9bef7f52393d4f306197bb48201399865486f4c6f1 for osrf/ros2:nightly
  • DDS implementation:
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

One of my nightly CI jobs for apex_launchtest (soon to be launch_testing) started failing with an rclpy stack trace. See issue here: ApexAI/apex_rostest#42.

I'm attempting to put together a simpler repro that doesn't use launch_testing

Expected behavior

Actual behavior

This stack trace when calling rclpy.spin:

======================================================================
FAIL: test_talker_transmits (talker_listener.test.py.TestTalkerListenerLink)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/apex_rostest/apex_launchtest_ros/examples/talker_listener.test.py", line 126, in test_talker_transmits
    self.spin_rclpy(1.0)
  File "/apex_rostest/apex_launchtest_ros/examples/talker_listener.test.py", line 109, in spin_rclpy
    executor.spin_once(timeout_sec=timeout_sec)
  File "/opt/ros/crystal/lib/python3.6/site-packages/rclpy/executors.py", line 631, in spin_once
    handler, entity, node = self.wait_for_ready_callbacks(timeout_sec=timeout_sec)
  File "/opt/ros/crystal/lib/python3.6/site-packages/rclpy/executors.py", line 617, in wait_for_ready_callbacks
    return next(self._cb_iter)
  File "/opt/ros/crystal/lib/python3.6/site-packages/rclpy/executors.py", line 558, in _wait_for_ready_callbacks
    if sub.callback_group.can_execute(sub):
  File "/opt/ros/crystal/lib/python3.6/site-packages/rclpy/callback_groups.py", line 103, in can_execute
    assert weakref.ref(entity) in self.entities
AssertionError

----------------------------------------------------------------------

Additional information

This started failing in a nightly CI job, so I think it's been triggered by a recent change. I've never done a deep dive into rclpy callback_groups though, so I'm not sure what's going on in there quite yet. It's possible I'm doing something wrong.

@sloretz
Copy link
Contributor

sloretz commented Apr 26, 2019

I think I just ran into this as well, and is caused by #318. I think the problem is these two lines:

def __eq__(self, other):
return self.handle == other.handle
def __hash__(self):
return self.handle.pointer

The callback group keeps a set of weak references to waitable entities. That change made the hash of a subscription the address of the rcl_subscription_t struct. When the subscription is destroyed the memory for the subscription is freed, but the weak reference held by the callback group has not been collected yet. If a new subscription is created then it can be allocated to the same memory as the destroyed subscription. It then has the same hash as the destroyed subscription, and that assertion fails.

I don't know why I added __eq__ and __hash__. I think the bug would be fixed by deleting those methods. The default implementation uses the address of the PyObject instead.

@sloretz sloretz self-assigned this Apr 26, 2019
@sloretz sloretz added in progress Actively being worked on (Kanban column) in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 26, 2019
@pbaughman
Copy link
Author

@sloretz I can confirm that deleting those functions from rclpy in the nightly docker image fixes my issue and the rclpy tests still pass.

Hooray for nightly CI jobs!

@sloretz sloretz removed the in review Waiting for review (Kanban column) label Apr 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants