Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] zenoh router takes 100% CPU after closing transient local subscriber #209

Open
JMvanBruggen opened this issue Aug 6, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@JMvanBruggen
Copy link

JMvanBruggen commented Aug 6, 2024

Describe the bug

Zenoh router takes 100% CPU usage after closing a transient local publisher/subscriber.

To reproduce

  1. Start a router: zenoh-bridge-ros2dds -m router
  2. Start a transient local publisher or subscriber. I recreated the rclcpp_minimal_subscriber example and rclcpp_minimal_publisher example, only changing the QoS to rclcpp::QoS(1).transient_local()
  3. Stop the publisher/subscriber

Router keeps functioning but CPU usage spikes to 100% and stays that way. No warning or error messages. I have to restart the router to fix it. But everytime a node with a transient local publisher/subscriber fails or respawns this happens. I first noticed this on the client side by the way, but that has an extra reproduction step, not sure if is caused by the same issue or needs a new item.

  1. Start a router: zenoh-bridge-ros2dds -m router
  2. Start a client zenoh-bridge-ros2dds -e tcp/ROBOT_IP:7447 -m client
  3. Start a transient local subscriber or publisher on the client side.
  4. Stop the publisher/subscriber

In this case there are warnings on the router side:

2024-08-06T10:29:17.076170Z  INFO async-std/runtime ThreadId(41) zenoh_plugin_ros2dds: Remote bridge 24c64d9fc79effddd3916e2ebad14184 retires Subscriber parameter_events
2024-08-06T10:29:17.076190Z  INFO async-std/runtime ThreadId(41) zenoh_plugin_ros2dds: Remote bridge 24c64d9fc79effddd3916e2ebad14184 retires Service Server minimal_subscriber/list_parameters
2024-08-06T10:29:17.076362Z  WARN async-std/runtime ThreadId(41) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/minimal_subscriber/list_parameters <-> Zenoh:minimal_subscriber/list_parameters): Error getting GUID of DDS entity - retcode=-3
2024-08-06T10:29:17.076506Z  INFO async-std/runtime ThreadId(41) zenoh_plugin_ros2dds::routes_mgr: Route Service Client (ROS:/minimal_subscriber/list_parameters <-> Zenoh:minimal_subscriber/list_parameters) removed
2024-08-06T10:29:17.078333Z  INFO async-std/runtime ThreadId(41) zenoh_plugin_ros2dds: Remote bridge 24c64d9fc79effddd3916e2ebad14184 retires Service Server minimal_subscriber/describe_parameters
2024-08-06T10:29:17.078537Z  WARN async-std/runtime ThreadId(41) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/minimal_subscriber/describe_parameters <-> Zenoh:minimal_subscriber/describe_parameters): Error getting GUID of DDS entity - retcode=-3
2024-08-06T10:29:17.078673Z  INFO async-std/runtime ThreadId(41) zenoh_plugin_ros2dds::routes_mgr: Route Service Client (ROS:/minimal_subscriber/describe_parameters <-> Zenoh:minimal_subscriber/describe_parameters) removed
2024-08-06T10:29:17.081728Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds: Remote bridge 24c64d9fc79effddd3916e2ebad14184 retires Service Server minimal_subscriber/set_parameters_atomically
2024-08-06T10:29:17.081874Z  WARN async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/minimal_subscriber/set_parameters_atomically <-> Zenoh:minimal_subscriber/set_parameters_atomically): Error getting GUID of DDS entity - retcode=-3
2024-08-06T10:29:17.082007Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::routes_mgr: Route Service Client (ROS:/minimal_subscriber/set_parameters_atomically <-> Zenoh:minimal_subscriber/set_parameters_atomically) removed
2024-08-06T10:29:17.083851Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds: Remote bridge 24c64d9fc79effddd3916e2ebad14184 retires Service Server minimal_subscriber/set_parameters
2024-08-06T10:29:17.084034Z  WARN async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/minimal_subscriber/set_parameters <-> Zenoh:minimal_subscriber/set_parameters): Error getting GUID of DDS entity - retcode=-3
2024-08-06T10:29:17.084169Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::routes_mgr: Route Service Client (ROS:/minimal_subscriber/set_parameters <-> Zenoh:minimal_subscriber/set_parameters) removed
2024-08-06T10:29:17.085890Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds: Remote bridge 24c64d9fc79effddd3916e2ebad14184 retires Service Server minimal_subscriber/get_parameter_types
2024-08-06T10:29:17.086024Z  WARN async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/minimal_subscriber/get_parameter_types <-> Zenoh:minimal_subscriber/get_parameter_types): Error getting GUID of DDS entity - retcode=-3
2024-08-06T10:29:17.086128Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::routes_mgr: Route Service Client (ROS:/minimal_subscriber/get_parameter_types <-> Zenoh:minimal_subscriber/get_parameter_types) removed
2024-08-06T10:29:17.088391Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds: Remote bridge 24c64d9fc79effddd3916e2ebad14184 retires Service Server minimal_subscriber/get_parameters
2024-08-06T10:29:17.088465Z  WARN async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/minimal_subscriber/get_parameters <-> Zenoh:minimal_subscriber/get_parameters): Error getting GUID of DDS entity - retcode=-3
2024-08-06T10:29:17.088484Z  INFO async-std/runtime ThreadId(40) zenoh_plugin_ros2dds::routes_mgr: Route Service Client (ROS:/minimal_subscriber/get_parameters <-> Zenoh:minimal_subscriber/get_parameters) removed


System info

  • Ubuntu 22:04
  • ROS Humble
  • zenoh-bridge-ros2dds 1.0.0~alpha.5-1 amd64
@JMvanBruggen JMvanBruggen added the bug Something isn't working label Aug 6, 2024
@aosmw
Copy link

aosmw commented Aug 8, 2024

I see 100% cpu often but don't know what triggers it.

I flamegraphed it when I caught it. Maybe we can compare flamegraphs and give the maintainers a hint.

# Install perf
sudo apt install linux-tools-generic
cargo install flamegraph

# Read the help
flamegraph -h

# Use sudo to run flamegraph
flamegraph --root -p $(pgrep zenoh-bridge-ro)

# Press Ctrl+c after 20-30sec
# a flamegraph.svg file is created

# Open flamegraph.svg in a browser and attach it here.

flamegraph

NOTE: Its an interactive svg but appears that github view is doing something to prevent the interactivity.
NOTE2: dpkg-query --show zenoh-bridge-ros2dds
zenoh-bridge-ros2dds 0.11.0-stable

@JEnoch
Copy link
Member

JEnoch commented Oct 18, 2024

Are you still experiencing this issue with version 1.0.0-rc.2 ?

@aosmw
Copy link

aosmw commented Oct 21, 2024

I am going to try this out today. I have had our environments locked to "0.11.0-stable" for a while.

Side Note: My config was using a string for the "id:" field which is now rejected.

Plugin "ros2dds" failed to start: Plugin `ros2dds` configuration error: unknown field `id` at zenoh-plugin-ros2dds/src/lib.rs:167

I am now commenting out the id field(allowing zenoh to generate its own random one), and putting the string I used to set id to into nodename.

I am writing this on the off chance that this may have been a trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants