-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A problem about 2subs 1pub #119
Comments
If i run 3sub apps and 1 pub app. Then i got The app which starts first seems can not send SEDP info to the one that starts later. Need you give me some points. |
HI @xu3stones, It vaguely reminds me of a long-since fixed problem caused by improper use of the I'll first try to summarize how the discovery works — you probably know all or most of it already, judging by your question, but if not it could be useful in understanding this — and then give some suggestions for things to look at. The way discovery works is that on startup, each process starts sending SPDP messages to all configured addresses (by default the 239.255.0.1 multicast address and any unicast addresses you have listed as peers — loopback on Linux doesn't advertise multicast capability so it falls back to unicast with "localhost" as a peer). Any process receiving an SPDP message from a not-yet known participant adds that participant and responds (via unicast). SPDP messages advertise addresses and indicate which discovery built-in readers/writers exist, so that both sides can match these built-in readers/writers immediately, without any further traffic, and use these to do the SEDP phase. These readers and writers are transient-local ones, and the way that transient-local works is that the readers need to request the historical data (describing any existing endpoints) by requesting a retransmit in exactly the same manner as they do to recover from packet loss. To know what retransmits to request, a reader needs a Heartbeat message from a writer that tells the reader which sequence numbers exist. So, on matching a new reader/writer pair, in Cyclone the reader starts sending out "no-op" AckNack messages (ack 0 samples, request retransmit of 0 samples) but that do request a response in the form of a Heartbeat; and the writer starts out sending Hearbeats periodically. Some flags will toggle based on what they receive so it quiets down once all is well. The result is that the reader learns what data to request, regardless of how the connection came into existence — including cases where you had a temporary disconnection in one direction only. The Heartbeats then trigger the "real" AckNack messages requesting retransmits, those then trigger retransmits and further Heartbeats and the cycle repeats until everything has been received — here you are dealing with so little data that I would expect it to be handled in one round. The addressing of retransmits in Cyclone is a complicated affair, because in the default behaviour it can be unicast or multicast, depending on the details. In the case of recovery of historical data by a new reader, the retransmits will typically be sent unicast. In all this, the WHC merely stores the data that is available for retransmit, and there is no reason to suspect that that one would be involved. So, with all that said, it seems to fit a pattern where multicast works fine, but unicast does not. Each process should have its own unique port numbers for unicasting, and if they don't (the From your message I gather you turned on the tracing, which is exactly the right thing to do, but it is notoriously hard to read … Still a few things are easily extracted:
Of course you can use Wireshark instead, but while Wireshark makes it really easy to decode packets, it doesn't make it at all easy to understand how the packets relate to each other, nor why some are sent. So I pretty much rely on the traces. Perhaps this will help you a bit in understanding what is going on. If you would like me to, I'd be happy to have a look, too. This is the kind of problem that must be understood and dealt with quickly, I don't want people new to Cyclone to immediately be disappointed … |
HI @eboasson Thanks for your explanation, after enable trace log, i still can not find out the reason. |
I have spotted a few things that I think have something to do with it — but there's a bit of guesswork involved: the first guess is that you're using some RTOS or so 👍🏻 and I think that may have something to do with it. As expected, each process creates a pair of sockets:
by the spec, the even one is for discovery traffic, and the odd one for data. Cyclone pretty much couldn't care less about that distinction as it really just looks at the contents, but it does try to play by the rules. So what you get is that the SPDP messages are sent to the even port numbers (there's The curious thing here is these SPDP messages should list the even port number for the discovery data and the odd one for the application data. However, when it traces the creation of the participant, the "meta" and "data" addresses are identical. I don't quite see how that can be the case if there are two sockets. It isn't that I'd expect it to break things — as mentioned, it doesn't care about port numbers as long as receives the data — but it is odd, and I wonder whether that may somehow be related to the platform you're using. The other thing is that in the default configuration, it creates multiple threads for accepting incoming data: In your case, there is no multicast and When you look at when which thread receives data, you'll see that at first it receives exclusively on the So it looks a bit like the In short … no smoking gun, but a few things worth looking into in more detail. Hope you this helps you a bit in the investigation. |
Hopefully the problem went away ... |
Hi,
Recently, i met a strange problem that when i run 2 sub Helloworld apps and 1 pub Helloworld app, they can not communicate (On the same environment, 1 sub and 1 pub works fine).
After my analyze, the SPDP step seems ok that both 3 apps can find the other 2.
Problem comes when it goes to SEDP step:
in my test, sub1 runs first, then the sub2, pub runs at last.
By reading the log, we got the following clue:
pub---> sub1 (pub sends SEDP info. to sub1 and sub1 goes into handle_SEDP)
pub---> sub2(pub sends SEDP info. to sub2 and sub2 goes into handle_SEDP)
sub2--->sub1(sub2 sends SEDP info. to sub1 and sub1 goes into handle_SEDP)
but all the others are lost.
Obviously, the app which starts first seems can not send SEDP info to the one that starts later.
It makes me confused, i'm not familiar with the WHC mechanism and the timeout mechanism.
Could you give me some potential possibility to help me check what causes this problem on my
test environment.
The text was updated successfully, but these errors were encountered: