Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated messages in XMPP bridged IRC rooms. #251

Open
Rampoina opened this issue Jun 5, 2021 · 17 comments
Open

Duplicated messages in XMPP bridged IRC rooms. #251

Rampoina opened this issue Jun 5, 2021 · 17 comments
Labels
S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect Bugs, crashes, hangs, vulnerabilities, or other reported problems.

Comments

@Rampoina
Copy link

Rampoina commented Jun 5, 2021

Describe the bug
Duplicated messages are appearing

To Reproduce

  1. Join https://matrix.to/#/#xmpp_#general_irc.freegamedev.net:matrix.org
  2. Check the duplicated messages from 05-06-2021 from 12:51 UTC

Expected behavior
No duplicated messages

Screenshots
2021-06-05-151628_668x877_scrot

Desktop (please complete the following information):

  • OS: Arch linux
  • Browser: Electron (Element Desktop)
  • Version: 1.7.28

Additional context
At first the behaviour seemed to be related to replying, but then it happened spontaneously.

The room in question is an irc channel bridged to XMPP with Bibomi

This issue seems to be related to #210

@poVoq
Copy link

poVoq commented Jun 5, 2021

It should be noted that this is purely Matrix side. Neither IRC nor XMPP users see the duplicated messages.

Restarting the Biboumi gateway seems to temporarily fix the issue.

@Neustradamus
Copy link

@Half-Shot: Have you looked this bug?

@tadzik
Copy link
Contributor

tadzik commented Aug 3, 2021

Right, this is a bit of a can of worms, and boils down to the way biboumi send messages to MUCs.

Episode 1: the biboumi duplicates

First, as an example of what happens when bifrost talks to a regular MUC. We have an XMPP connection open for each matrix user, each of them receiving messages. In order to avoid duplicate messages there we utilize the fact that the same message is going to have the same ID (here) and drop it if it shows up.

For demonstration purposes, we have two matrix users in a matrix room bridged to a regular MUC with bifrost. When a message is sent to the MUC from XMPP (and with some extra logging added), this happens:

Aug-3 14:23:21.260 INFO XmppJsInstance Got text message "message from pidgin to conf.localhost". ID: purple29d69ec
Aug-3 14:23:21.262 INFO XmppJsInstance Duplicate stanza received (purple29d69ec), ignoring

We properly figure out that the same message arrived twice, and all is well.

When a matrix user sends a message to the room all is good also: we record the message that we've sent to XMPP the same way (note the Matrixy message ID) and ignore it when it comes back:

Aug-3 14:28:21.125 INFO XmppJsInstance Duplicate stanza received ($2JtTaL3WVKyOoC_KjA3Xqbd5dMKYcZngWZcZmKlSu0s), ignoring
Aug-3 14:28:21.125 INFO XmppJsInstance Duplicate stanza received ($2JtTaL3WVKyOoC_KjA3Xqbd5dMKYcZngWZcZmKlSu0s), ignoring

So far so good. Problems arise the moment biboumi comes into play. For a similar setup (two matrix users in a bridged room), when a message is sent from XMPP, biboumi logs the following:

[DEBUG]: src/xmpp/xmpp_component.cpp:162:       XMPP RECEIVING: <message from='buddy@localhost/sfqTpaGN' id='purple29d69f3' to='#duplifest%ngircd.local@biboumi.localhost' type='groupchat' xmlns='jabber:component:accept'><body xmlns='jabber:component:accept'>hello from pidgin!</body></message>
[DEBUG]: src/xmpp/xmpp_component.cpp:77:        XMPP SENDING: <message from='#duplifest%ngircd.local@biboumi.localhost/buddy' id='purple29d69f3' to='buddy@localhost/sfqTpaGN' type='groupchat'><body>hello from pidgin!</body><stanza-id by='#duplifest%ngircd.local@biboumi.localhost' id='c2e16a29-4f72-439e-9984-7d855d23acc9' xmlns='urn:xmpp:sid:0'/></message>
[DEBUG]: src/xmpp/xmpp_component.cpp:77:        XMPP SENDING: <message from='#duplifest%ngircd.local@biboumi.localhost/buddy' id='27fda391-26cd-4ab3-995c-328e26509709' to='test2_localhost@matrix.localhost/matrix-bridge' type='groupchat'><body>hello from pidgin!</body><stanza-id by='#duplifest%ngircd.local@biboumi.localhost' id='fd7f62f4-4567-4ce9-91fd-4bb8f408cdb2' xmlns='urn:xmpp:sid:0'/></message>
[DEBUG]: src/xmpp/xmpp_component.cpp:77:        XMPP SENDING: <message from='#duplifest%ngircd.local@biboumi.localhost/buddy' id='232a1fe4-abed-49da-8cb6-77da5d1886cd' to='test_localhost@matrix.localhost/matrix-bridge' type='groupchat'><body>hello from pidgin!</body><stanza-id by='#duplifest%ngircd.local@biboumi.localhost' id='0493d5d8-8956-4c22-92fa-46b455cc80db' xmlns='urn:xmpp:sid:0'/></message>

Biboumi sends the message back to the sender with the original ID (fair), but then also forwards them to the matrix users, with a unique ID each time. Bifrost gets confused here and duplicates the message since it has no way of telling that they're in fact the same:

Aug-3 14:30:07.823 INFO XmppJsInstance Got text message "hello from pidgin!". ID: 27fda391-26cd-4ab3-995c-328e26509709
Aug-3 14:30:07.825 INFO XmppJsInstance Got text message "hello from pidgin!". ID: 232a1fe4-abed-49da-8cb6-77da5d1886cd

This is why the duplicate message bug happens 1) only on the matrix side of things 2) only when bifrost interacts with biboumi handling the MUC.

Similar thing happens when a Matrix user speaks in the IRC room. Biboumi resends the same message with the same ID to the sender XMPP connection (which gets dropped as it should), but then it sends the same message to the other Matrix user's XMPP connection with a different ID, so that gets propagated to matrix unnecessarily.

Ideally biboumi would be more considerate here, but since we can't really expect that in the wild, a way out of this situation is for the xmpp.js backend to opt-in to the message deduplicator that purple uses (which ignores duplicate room-sender-messagebody entries) which we currently disable here.

Episode 2: bifrost's message echoing

The problem seemingly goes away after enabling the deduplication, at least until the bridge gets restarted. Once it does, deduplicator loses track of its usersInRoom counter (which is not persistent), and inserts undefined amount of hashes into its expectedMessages – causing it to echo back the messages that we get back from biboumi.

To be continued...

@selurvedu
Copy link

@tadzik very interesting, thanks for sharing your findings.

Ideally biboumi would be more considerate here

Is there anything particual you have on your mind?

@tadzik
Copy link
Contributor

tadzik commented Aug 3, 2021

@tadzik very interesting, thanks for sharing your findings.

Ideally biboumi would be more considerate here

Is there anything particual you have on your mind?

The easiest thing for us would be biboumi sending each identical message with the same ID – it currently seems to do that when resending the original message that came from XMPP (see the biboumi log above), but generates new IDs for subsequent copies (same log). Not sure if the standard cares either way, to be honest, and whether we should actually expect that to be true in bifrost or not, but it'd certainly make the problem go away :) MUCs in regular XMPP servers seem to behave that way, which is why only the biboumi interaction is proving problematic.

@poVoq
Copy link

poVoq commented Aug 3, 2021

I think this just exposes an edge-case as in XMPP the source stanza ids can not be expected to be the same all the time. But don't quote me on that :) Anyways, probably better to find a way to deal with that Bifrost side as you already mentioned.

@emorrp1
Copy link

emorrp1 commented Aug 3, 2021

@selurvedu @poVoq biboumi is ignoring a SHOULD as of 2018 here. It seems this was discussed shortly before the standard was updated.

the service MUST change the 'from' attribute to the sender's occupant JID and reflect the message out to the full JID of each occupant.
The service SHOULD reflect the message with the same 'id' that was generated by the client, to allow clients to track their outbound messages. If the client did not provide an 'id', the server MAY generate an 'id' and use it for all reflections of the same message
Note: the requirement to reflect the 'id' attribute was added in version 1.31 of this XEP.
Version 1.31 (2018-03-06)

Even if you interpret the first SHOULD sentence as only requiring the id to be reflected back to the original sender, the second clearly states the same ID should be reused for all reflections of the same message and the example given backs it up.

@selurvedu
Copy link

@emorrp1 yes, we discussed this with @tadzik today in Biboumi Discussion chat:

(13:44:14) selurvedu: tadzik, thanks for replying so fast here #251 (comment)

If I got what's hapening there right, the duplicate messages on the Matrix side appear only when there are two or more users that joined via Matrix<-bifröst->XMPP<-biboumi->IRC since the messages get reflected to each individual groupchat occupant, as the XEP says, and bifröst has no idea the messages are the same.

(13:45:49) tadzik: selurvedu: correct. However, like I mentioned, this only happens with Biboumi, not with regular pure XMPP MUCs. It seems like a regular MUC does reflect a message to each occupant, yes, but sends them with the same message ID – so bifrost sees the second message and goes "ah okay, we've had that before"

(13:46:16) tadzik: whereas the messages that biboumi emits all have distinct IDs, which makes bifrost treat them as distinct

(13:46:51) tadzik: I'm not sure if XEP has an opinion either way on this, but all XMPP servers we've encountered so far (except biboumi) seem to use the same ID for each reflected message

(13:47:02) tadzik: (and so does bifrost for when it does that)

(13:48:55) selurvedu: I was gonna ask if anyone here has an opinion on sending multiple message stanzas with same id to multiple recipients, but it appears that this example from the XEP suggests doing so: https://xmpp.org/extensions/xep-0045.html#message

Below the example there's a paragraph:

The service SHOULD reflect the message with the same 'id' that was generated by the client, to allow clients to track their outbound messages. If the client did not provide an 'id', the server MAY generate an 'id' and use it for all reflections of the same message (e.g. using a UUID as defined in RFC 4122 [18]).

This does not say the service should preserve the id for reflections sent to other occupants as well, but that example does so, so I guess it's not forbidden at least.

So I have a question to the devs instead: do different ids for reflected messages serve a purpose? If not, could those be carried over from the incoming message?

(13:49:24) selurvedu: > I'm not sure if XEP has an opinion either way on this, but all XMPP servers we've encountered so far (except biboumi) seem to use the same ID for each reflected message

Cool, thanks for confirming that.

(13:50:21) tadzik: ah, good find, thanks for that. So we SHOULD generally be able to rely on that, but it's not a MUST so we'd best be ready for some server implementations diverging on that

(13:53:27) selurvedu: tadzik, right. But still, if biboumi devs are not against using the same ids and if this will get changed, I don't think a fix on the bifrost side would be necessary. On the other hand, as you said, it may still be needed for other server implementations, or just for older biboumi installations.

(13:55:31) tadzik: right, bifrost should not need changes if biboumi changes its behaviour

(13:55:42) tadzik: but ideally it'd be more resilient regardless

Thanks for pointing at the issue in the biboumi issue tracker. I can see there that preserving the id for the first reflection was suggested by ge0rg.

Even if you interpret the first SHOULD sentence as only requiring the id to be reflected back to the original sender, the second clearly states the same ID should be reused for all reflections of the same message and the example given backs it up.

As I said in the chat, it would be happy if biboumi did this, however, I don't see where it is explicitly stated that all reflections should have the same id, not just the one that is being reflected back to the sender. Same id for all reflections is only shown in the example. Could you elaborate how you came to such conclusion?

@Neustradamus
Copy link

cc @louiz, @linkmauve, @mathieui.

@mathieui
Copy link

mathieui commented Aug 4, 2021

@Neustradamus don’t do that.

@tadzik I have not done any significant developement on biboumi; however the issue here is that each XMPP<->IRC connection is independent from the other (which helps for security reasons, e.g. a user banned from a chatroom can no longer receive the history of the room, etc), and changing this for message IDs might be quite a bit of work (biboumi was designed to be an individual bridge, which is why this has never been an issue). This might better be explored in a new biboumi issue though.

@emorrp1
Copy link

emorrp1 commented Aug 4, 2021

Even if you interpret the first SHOULD sentence as only requiring the id to be reflected back to the original sender, the second clearly states the same ID should be reused for all reflections of the same message and the example given backs it up.

As I said in the chat, it would be happy if biboumi did this, however, I don't see where it is explicitly stated that all reflections should have the same id, not just the one that is being reflected back to the sender. Same id for all reflections is only shown in the example. Could you elaborate how you came to such conclusion?

I'll trim a few words from the XEP to hopefully make it clearer. I don't believe I'm changing any meaning, just bringing phrases closer together to show how heavily its implied by the overall phrasing of the definition:

and "reflect the message" out to ... each occupant ... "reflect the message" with the same 'id' ... use it for all "reflections of the same message"

@selurvedu
Copy link

selurvedu commented Aug 4, 2021

don’t do that

+1

@mathieui

the issue here is that each XMPP<->IRC connection is independent from the other (which helps for security reasons, e.g. a user banned from a chatroom can no longer receive the history of the room, etc)

Interesting. I did not think of that.

@emorrp1 thanks for clarifying, I see it now.

@mathieui
Copy link

mathieui commented Aug 4, 2021

I'll trim a few words from the XEP to hopefully make it clearer. I don't believe I'm changing any meaning, just bringing phrases closer together to show how heavily its implied by the overall phrasing of the definition:

and "reflect the message" out to ... each occupant ... "reflect the message" with the same 'id' ... use it for all "reflections of the same message"

I will add that the only sane way is for MUCs to send the same IDs to everyone, because otherwise any feature such as corrections, retractions, references, etc, which rely on message IDs will be broken, unless the MUC intercepts and rewrites everything (which is insane and will not work when the conversation is encrypted).

Biboumi is not violating the spec as the messages you receive from other users are from IRC after being (possibly) sent from biboumi to IRC, they are not from biboumi itself.

@louiz
Copy link

louiz commented Aug 4, 2021

As mathieui said, a same channel (on the same IRC server) is a different room for each different biboumi user.
Biboumi doesn’t know that #c++%irc.freenode.net@biboumi.localhost for user louiz@localhost is the same channel as #c++%chat.freenode.net@biboumi.localhost for user mathieui@localhost. The two are not linked in any way.

The only way this could be done in biboumi would be to have deterministic IDs, like hashing the IRC nick + message + date + something (well, not the date, since the message may arrive at different times on each socket). But this probably violates many XEPs, and will certainly create some new edge-cases.

I don’t think that’s possible.

@selurvedu
Copy link

@louiz

a same channel (on the same IRC server) is a different room for each different biboumi user.

I see, this is what I feared the most. I was thinking it's like a regular MUC. Thank you for clarifying. It seems that bifrost will have to figure our its own deduplication mechanism indeed.

@poVoq
Copy link

poVoq commented Aug 4, 2021

Just for reference: https://git.singpolyma.net/cheogram-muc-bridge
Is having similar issues and seemingly does the de-duplication with the help of a bot user in the channel, but it is still work in progress and doesn't work perfectly either.

@Half-Shot Half-Shot added T-Defect Bugs, crashes, hangs, vulnerabilities, or other reported problems. S-Minor Impairs non-critical functionality or suitable workarounds exist labels May 18, 2022
@Half-Shot
Copy link
Collaborator

Confirming this is still an issue, we're considering it a S-Minor as it only impacts IRC bridged rooms for the moment. That said, it's still something we'd like to solve.

@Half-Shot Half-Shot changed the title Duplicated messages Duplicated messages in XMPP bridged IRC rooms. May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect Bugs, crashes, hangs, vulnerabilities, or other reported problems.
Projects
None yet
Development

No branches or pull requests

9 participants