-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discovery: too slow and high network usage #281
Comments
Hi @alsora
How have you checked the nodes have discovered each other? Because if they havent, there is a chance for the discovery to depend on the participant's announcement period.
Will you be so kind to send Wireshark captures of the working and non-working case, in order to better understand where the problem may be?
I don't think so, since that change was recovering the timings to be the same as for v1.7.2. The constructor of Duration_t has changed to receive nanoseconds instead of fraction, so that commit made the timings be the correct nanoseconds. We will reproduce the issue with the simple example shown in #249 and will look for the commit that provoked the regression on discovery. |
@MiguelCompany Thank you for replying Here you can see the functions that I'm using for checking PDP and EDP I use the following APIs: For what concerns reproducing the issue: note that I am testing 2 applications:
In the first one I don't see any issues. At the moment, the solution that I'm using in order to run the second one, I'm waiting 1 second between the creation of each node. I will get to you some data from Wireshark as soon as possible |
The nightly sanitizer jobs have found a data race on custom_participant_info.hpp (
So, let's summarize ...
Is this what happens? If so, how much time does it take for Thank you for helping us understanding the issue. |
By the way, we are trying to reproduce the problem with the example you provided on #249. We added the example to ros2 demos repo here and haven't been able to reproduce the problem We are also adding a blackbox test for a similar situation: 30 participants each creating one publisher and one subscriber to the same topic here (Still WIP) |
Yes that's exactly what happens.
Adding some logs here and there, I see always a small number of subscriptions not matched (less than 3). I think that in any case adding some tests like this can be really useful also for the future! However, keep in mind that each ROS2 node also creates a Parameter Server, i.e. 6 RTPSReader and 6 RTPSWriter. |
I tried again the old "stress test": I start seeing problems when I have 1 publisher 50 subscribers. |
@MiguelCompany Here the wireshark data. TEST 1: no wait between nodes creationPDP time: 50ms TEST 2: 1 sec wait between nodes creationPDP time: 0 In the second test, the nodes creation takes 20 seconds (1 sec per node). During this time Wireshark shows a Network usage of 10Kb per second. |
We found the issue. It was related with a change necessary for the implementation of the lifespan QoS. A fix is on the way in eProsima/Fast-DDS#541, a new blackbox test is being added in eProsima/Fast-DDS#542, and a new unit test is under development. |
@MiguelCompany great news! @alsora can you please retest with the latest code including the fix. |
I tested again with the latest updates. The situation is definitely improved, but it's not fixed. Considering 10 runs:
This is different than what I saw last week, where PDP was working and EDP was not. |
@MiguelCompany The problem persists even after fixing the data race. |
@alsora I don't know if this issue is still relevant or not. Do you think it can be closed? |
Yes, I think it can be closed. |
Hi,
after updating to Fast-RTPS 1.8.0 I have again issues during discovery when running applications with approximately 20 nodes.
The behavior is the same I found in Fast-RTPS 1.7.0 #249
I can tell you that all the nodes discover each other, however the Endpoint Discovery Phase hangs up forever.
Moreover, during the discovery, I can see a network usage of approximately 50Kb per second in upload.
The application I'm trying to run has 20 nodes, 23 publishers and 35 subscriptions.
https://github.com/irobot-ros/ros2-performance/tree/master/performances/benchmark
Could it be related to this change?
eProsima/Fast-DDS@af648ac
The text was updated successfully, but these errors were encountered: