Skip to content

slow publishing and performance for custom messages with large arrays #346

@berndpfrommer

Description

@berndpfrommer

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • ROS2 galactic standard ubuntu package installed via apt
  • Version or commit hash:
    This is what the apt package info says: ros-galactic-cyclonedds/focal,now 0.8.0-5focal.20210608.002038 amd64 [installed,automatic]
  • DDS implementation:
    • cyclonedds
  • Client library (if applicable):
    • rclcpp
  • Hardware:
    AMD Ryzen 7 4800H with 64GB of memory

Steps to reproduce issue

I have made a very small repo with the below demo code and instructions how to run:
https://github.com/berndpfrommer/ros2_issues
Here is the source code for the publisher:

#include <unistd.h>

#include <rclcpp/rclcpp.hpp>
#include <ros2_issues/msg/test_array_complex.hpp>
#include <thread>

template <class MsgType>
struct TestPublisher : public rclcpp::Node
{
  explicit TestPublisher(const rclcpp::NodeOptions & options)
  : Node("test_publisher", options)
  {
    pub_ = create_publisher<MsgType>(
      "~/array", declare_parameter<int>("q_size", 1000));
    thread_ = std::thread([this]() {
      rclcpp::Rate rate(declare_parameter<int>("rate", 1000));
      const int numElements = declare_parameter<int>("num_elements", 100);
      rclcpp::Time t_start = now();
      size_t msg_cnt(0);
      const rclcpp::Duration logInterval = rclcpp::Duration::from_seconds(1.0);
      while (rclcpp::ok()) {
        MsgType msg;
        msg.header.stamp = now();
        msg.elements.resize(numElements);
        pub_->publish(msg);
        rate.sleep();
        msg_cnt++;
        rclcpp::Time t = now();
        const rclcpp::Duration dt = t - t_start;
        if (dt > logInterval) {
          RCLCPP_INFO(get_logger(), "pub rate: %8.2f", msg_cnt / dt.seconds());
          t_start = t;
          msg_cnt = 0;
        }
      }
    });
  }
  // -- variables                                                                                                                                                                                                  
  typename rclcpp::Publisher<MsgType>::SharedPtr pub_;
  std::thread thread_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  auto node =
    std::make_shared<TestPublisher<ros2_issues::msg::TestArrayComplex>>(
      rclcpp::NodeOptions());
  rclcpp::spin(node);
  rclcpp::shutdown();
  return 0;
}

It uses the following custom message for TestArrayComplex:

std_msgs/Header header
# test array of elements
TestElement[] elements

and the TestElement of the array is defined as:

uint16 x
uint16 y
builtin_interfaces/Time ts
bool polarity

Expected behavior

Under ROS1 I can publish 1000 msgs/sec with 100,000 elements per message and receive at a rate of 1000Hz with rostopic hz

Actual behavior

Under ROS2 (galactic), already the publishing fails to keep up at a message size of 5,000 elements. Running the publisher with

RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ros2 run ros2_issues publisher_node --ros-args -p num_elements:=5000 -p rate:=1000

produces this output:

634242922.738549 [0] publisher_: using network interface wlo1 (udp/192.168.1.234) selected arbitrarily from: wlo1, virbr0, virbr1, docker0
[INFO] [1634242923.747148904] [test_publisher]: pub rate:   369.34
[INFO] [1634242924.749220049] [test_publisher]: pub rate:   373.22
...

So not even the publishing is full speed, without any subscriber to the topic. I see the publisher running at 100 %CPU, so something is really heavy weight about publishing.
Worse, running rostopic hz shows a rate of about 30 msg/s.

RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ros2 topic hz -w 100 /test_publisher/array
1634243031.116776 [0]       ros2: using network interface wlo1 (udp/192.168.1.234) selected arbitrarily from: wlo1, virbr0, virbr1, docker0
average rate: 31.275
        min: 0.030s max: 0.041s std dev: 0.00314s window: 33
average rate: 30.502
        min: 0.030s max: 0.044s std dev: 0.00351s window: 63

This is what I get from rostopic bw. The size of the message (about 80kb) agrees with what I computed by hand:

5.20 MB/s from 100 messages
        Message size mean: 0.08 MB min: 0.08 MB max: 0.08 MB

I tried sudo` sysctl -w net.core.rmem_max=8388608 net.core.rmem_default=8388608 and also was able to restrict the interface to loopback (lo) but no improvement.

FastRTPS is a bit better, at least here I can send messages with up to 50,000 elements before it falls off at 110,000 messages:

RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 run ros2_issues publisher_node_complex --ros-args -p num_elements:=110000 -p rate:=1000                                       
[INFO] [1634243372.579736663] [test_publisher]: pub rate:   404.43
[INFO] [1634243373.581322637] [test_publisher]: pub rate:   391.37
[INFO] [1634243374.583210605] [test_publisher]: pub rate:   414.22

But if I send messages of size 5,000, rostopic hz also shows about 30Hz, similar to rmw_cyclonedds_cpp.

Additional information

This is a show stopper for porting e.g. a driver for an event based camera from ROS1 to ROS2, see here: https://github.com/berndpfrommer/metavision_ros_driver.

The hardware is a 8-core AMD Ryzen laptop, less than 1 year old, so definitely not a slow machine, and this is all running on-machine, no network traffic.

To run the above code it is fastest to clone the very small repo linked above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions