-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ros2 topic info -v prints incorrect QoS info for macOS + CycloneDDS #525
Comments
I have runned this with the Fat binary from https://ci.ros2.org/view/packaging/ and compiling from sources and in both cases I can see the right durability.
can you double checked this? |
@prajakta-gokhale It looks like this was reported in a Foxy prerelease. Do you still see the issue now that Foxy is released? It looks like @ahcorde doesn't see it. |
Thanks for looking! I'll try again and provide an update. |
I got the same issue on Ubuntu with source build ros2/ros2@0ee1ca2, only rmw_cyclonedds can reproduce this issue.
|
Closing! |
I didn't see this comment, so this seem to still be a bug. |
As far as i see, this still happens. And when we use cyclonedds @eboasson friendly ping, if you could share your thought on this? |
@fujitatomoya that's rather interesting, which in this case means it doesn't ring any bell and I don't see this problem when I try it ("Debug" source build on macOS, mid august). It could be some missing initialization somewhere that acts up in some build types and configurations and not in others ... In any case, I don't quite see how: https://github.com/ros2/rmw_cyclonedds/blob/a07553e620eb92321949d7951b08161a65559302/rmw_cyclonedds_cpp/src/rmw_node.cpp#L1631 could get the conversion wrong. The functions to create publishers and subscriptions, as well as those to retrieve the actual QoS settings, rely on them. Given the absence of an obvious culprit, to me it would make the most sense to look at the DDS level directly to learn a bit more about where the origin of the problem is. A Cyclone trace can be used to for this (there are other ways, but I find the trace files often the easiest — that may be a deformation 🤷). Would you be able to give this a try?
That should give you 3 lines of output, one for each process (the daemon is the 3rd). In my case, where everything works fine, they all show
I'm thinking this should at least tell us whether we need to be looking at the creation of the publisher or at the retrieving of the actual QoS. |
appreciate for your help!!! yes it is weird that some people say it works okay but some don't. maybe i miss something here... 😭 anyway i have reproduction environment, so i will try to get trace information by procedure you provided. |
Reproducible Procedure
could someone try to confirm above procedure to reproduce the issue just in case? |
I cannot get the log file |
after restart container, I am no longer able to reproduce this issue with #525 (comment). i don't know what to tell, but sorry 😢 I'll try to find another environment. |
Foolish me ... trying to be helpful giving copy-paste-ready lines and then forgetting the
Hopefully you can find one without much effort 🤞 |
Yes, I can reproduce this issue by your steps. (I agree we can use a different RMW to get information from others)
|
thanks for checking! |
@iuhilnehc-ynos this is getting weirder by the comment! So now Cyclone DDS shows the correct durability setting and Fast-RTPS shows the wrong one! One thing it proves: we really need to know what the publisher process advertises over discovery. I would appreciate it very much if could you run it with the |
@iuhilnehc-ynos perhaps you need not do what I asked just now: after I published the comment, I realised that I could try this particular experiment of mixing a Cyclone DDS-based publisher with a Fast-RTPS-based ros2 cli. I get exactly the same result. That's good news. Cyclone only publishes QoS settings that are different from the default. In section 9.6.2.2.1 the DDSI-RTPS specification states that for parameters (e.g., durability QoS, a.k.a. which is and Fast-RTPS or its RMW layer is misinterpreting the discovery data. I suspect that when @fujitatomoya reproduced it, the ros2 daemon was running using Fast-RTPS. That one then stores the incorrect discovery data and regurgitates it to "ros2 topic info" even if that one is running Cyclone. Restarting the machine/container could easily have resulted in using Cyclone DDS for the daemon. And if instead you use Cyclone DDS for the daemon, even Fast-RTPS gets it right:
|
I used another |
Thanks for checking, my employer requires me to avoid touching Connext because of some less than pleasant interactions in the past ... Presumably Connext then also includes the durability QoS when it is volatile (you can fairly easily check with Wireshark: record the startup, then look for DATA(w) submessages and look through for the topic name in the expanded parameter list). By the way, you can make Cyclone publish all QoS settings all the time with an option:
and when you do that, Fast-RTPS suddenly gets it right. So there's a work around, but it relies on wasting network bandwidth. I also know from completely unrelated situations that are no problems interoperating with Connext if you leave out the durability QoS (or many others) from projects using both Connext and OpenSplice (which has the same behaviour with respect to publish QoS settings as Cyclone). So it really is the Fast-RTPS side that is broken. |
Thanks for all of the debugging on this issue, it is really appreciated. @iuhilnehc-ynos @fujitatomoya Since it seems like this is a bug in Fast-DDS, would one of you mind summarizing the problem and opening a new issue on https://github.com/eProsima/Fast-DDS/issues ? |
I don't think this bug is in Fast-DDS. IMO, it's because ros2_daemon(rmw_fastrtps) not received QoS information from NOTE: I believe it belongs to |
OK, I misunderstood. If @eboasson agrees, then I think we'll want to open up an issue there. Thanks. |
@clalancette and @iuhilnehc-ynos, it is a bug in Fast-DDS: Cyclone DDS relies on the receivers of the discovery data to implement required behaviour and fill in the correct values for QoS settings not explicitly included in the discovery message. The bug here is that Fast-DDS uses the wrong default value (at least for durability). |
@clalancette @eboasson @iuhilnehc-ynos I am not sure where exactly this problem belongs to yet 🤕 , i just need more time make it clear. I think what we see here is some misunderstanding of syntax between different dds implementation, that will lead the discovery process broken then wrong QoS. that is as far as i can tell. (i guess that we all agree on this.) Problem
Reproducible Procedure
Symptom
after all, maybe it would be suitable to create move this issue what would you think? |
Sorry, I think it's my mistake. FastDDS does have a problem. If
But I can't find the information in DDSI-RTPS specification about or Let me put it another way, why |
Thanks for your summary.
I noticed this. I think it's also a problem. (About |
No worries, @iuhilnehc-ynos, there're just too many details to be on top of them all.
Yes, that totally explains it
It is this paragraph:
(Page 169 of DDSI-RTPS 2.3, emphasis mine) Table 9.11 then says "see DDS specification", which is the big table in the DDS 1.4 spec starting on page 2-104.
It is because it isn't actually set to the default value: the reliability QoS is a pair of kind (best-effort or reliable) and a "max blocking time" that says for how long the writer should block when resource limits prevent it from completing the write operation. The DDS default for "max blocking time" is 100ms, but the Cyclone RMW layer sets it to ∞ and that's the reason it is included. I now see that Wireshark doesn't print the "max blocking time" in the table. It doesn't affect reader/writer matching, so in that respect it is not very important, but it is a required part of the wire format so it is kinda odd that Wireshark omits it. (Required because the DDSI-RTPS spec says the type of the "reliability" parameter in the discovery is It's the "ff ff ff 7f ff ff ff ff" bit in the highlighted bytes. |
When the and the questions is why? @MiguelCompany could you kindly share your thought? this is really getting into dds implementation. |
|
I appreciate your explanation. |
This is an interesting. Quoting from section 9.6.2.2 in the DDSI-RTPS spec:
The participant GUID is entirely redundant because it is always the same as the reader/writer GUID with the entity id component replaced by 0x1c1. So in my view the normative text says the participant GUID should be omitted and the referenced table is therefore simply an incomplete list of "forbidden" items. I guess one could hold a different opinion and argue that the intent behind that paragraph is that only those listed in table 9.10 should be left. But the fact of the matter is that neither OpenSplice nor Cyclone DDS has ever published it in the reader/writer information for a decade without it causing any interoperability issues, and I have furthermore checked some packet captures related to investigating interoperability issues in that decade that show Connext and CoreDX also leave it out. (That also means I would expect the same all-zero GUID to show when using Connext instead of Cyclone in this experiment.) Finally, if eProsima is of the opinion that the participant GUID must be present in the reader/writer information, they should reject the discover data as invalid — maybe I should try fuzzing it? — and not use a nonsensical default value instead. |
I see what you mentioned in this thread makes perfect sense. |
Regarding missing durability info, this is a bug which is being fixed by eProsima/Fast-DDS#1384 Regarding Section 9.6.2.2 indicates that
It then says on Table 9.10 that Mapping of This means that field I have created PR eProsima/Fast-DDS#1382 to handle the case of a missing PID_PARTICIPANT_GUID and in that case default it to using the GUID prefix coming on PID_ENDPOINT_GUID and the default participant entity id |
let me summarize current status in quick, Problems
Current Fix
i confirmed the fix but it would be better if someone does double check in macOS to make sure since this is registered for macOS at 1st. Notethere's been still discussion about DDSI-RTPS specification. |
I have confirmed all these issues will be fixed in macOS after eProsima/Fast-DDS#1384 is merged.
|
@prajakta-gokhale this is taken care by eProsima/Fast-DDS#1380, could you kindly confirm? |
I just verified that on Galactic, this problem is solved. I'm going to go ahead and close this one out, thanks all for the debugging and fixes here. |
Bug report
Required Info:
Steps to reproduce issue
ros2 topic info -v
does not print correct QoS information for a topic.Expected behavior
Durability value printed should be:
Durability: RMW_QOS_POLICY_DURABILITY_VOLATILE
Actual behavior
Durability value printed is:
Durability: RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL
Additional information
These commands appear to work as expected on Linux and Windows for CycloneDDS.
The text was updated successfully, but these errors were encountered: