-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/paho-mqtt: add support for gnrc #17004
base: master
Are you sure you want to change the base?
Conversation
Hi, first of all, sadly I don't have time for an in-depth review but maybe I can guide you in the right direction: Regarding your test setup: You mentioned that you are working with link-local addresses right? Did you add an interface identifier to your IP address? If not, this could be the reason why nothing is sent. For a reliably working test setup you can take a look under tests/gnrc_tcp. Could you give us more details on the blocking (-ETIMEDOUT) or non-blocking (-EAGAIN) read question? I don't really understand the issue you are trying to solve. Cheers Simon |
No, I didn't. I thought it should work even without specifying an interface when using The paho-mqtt interface currently does not support specifying an interface. Maybe I should just add this.
Sorry for the bad explanation... Basically, the Paho thread does TCP reads in a loop with sleep periods in between. Each loop iteration runs for a certain amount of time (10 ms by default). I used non-blocking reads to get rid of -ETIMEDOUT errors (and because lwip also uses non-blocking reads). But I'm not sure that is the best choice. My main concern is that with non-blocking reads, there might be multiple reads per iteration, which is unnecessary, so blocking reads would be better. Do you/Martine have an opinion either way? Both options work, so maybe it is a non-issue. Here is the relevant Paho code that does the looping: https://github.com/eclipse/paho.mqtt.embedded-c/blob/29ab2aa29c5e47794284376d7f8386cfd54c3eed/MQTTClient-C/src/MQTTClient.c#L341-L359 |
It does, when you compile the module |
Given that MQTT is designed to be asynchronous, non-blocking reads might be desirable. But I am not deep enough into the internals of paho, to make this a binding condition. I do remember, however, that something like asynchronous |
I would prefer non-blocking reads. If Paho tries to read data in 10ms intervals I don't see why read attempts should have an additional timeout. There is nothing to be gained here using a blocking read. |
OK, I will leave it as it is then. I've added code for specifying the interface for the broker address. The test now passes on native for |
Could you rebase the PR? Some merge conflicts have sneaked in in the meantime |
0236d6e
to
2dc9f10
Compare
OK, done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise this good to me. One remark inline
@miri64 scapy is not my strong suit. Maybe you could take a look at the testing script?
pkg/paho-mqtt/contrib/riot_iface.c
Outdated
_buf = buf; | ||
_len = len; | ||
_timeout = timeout_ms; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it compile if the conditionals are still left at C level?
Note that the IS_USED()
macro resolves to the numeric literal 0
or 1
at preprocessor time. Hence, it is trivial for the compiler to optimize out dead branches. Only with -O0
it would be actually be more efficient to use the preprocessor, in all other cases there is no cost. You can even call functions that are declared by not implemented (e.g. if the module tsrb
is not used with GNRC but only called in the IS_USED(MODULE_LWIP)
case, there will be no undefined symbol error at link time, since the function calls in the dead branches are optimized out).
The advantage of keeping it a C level is that the C compiler still performs syntax checks and diagnostics on all variants and issues are easier found. For that reason, we generally prefer C level conditionals over preprocessor conditionals.
Note that accessing e.g. struct
members that are only provided when a given module is used will not work without the preprocessors, as even in dead branches C code may not accesses non-existing struct
members. So there cases in which avoiding the preprocessor is not possible (or would require carrying unused struct
members into all configurations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to replace the #if defined(...)
with if (IS_USED(...))
below in pkg/paho-mqtt/contrib/riot_iface.c#L141, so I have done that.
In all the other cases, replacing it causes compiler errors since the code segments use the variables buffer
, _temp_buf
and/or tsrb_lwip_tcp
which are not declared if LWIP isn't used (pkg/paho-mqtt/contrib/riot_iface.c#L42-L46).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to just declare the variables in all cases. Both GCC and clang will just garbage collect unused variable, if they are used only in known-to-be dead branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that also works :) I have made the changes.
pkg/paho-mqtt/contrib/riot_iface.c
Outdated
} | ||
|
||
rc = tsrb_get(&tsrb_lwip_tcp, buf, len); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I believe this could be left at C level without downsides. (More instances below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments inline. Please squash at will.
pkg/paho-mqtt/contrib/riot_iface.c
Outdated
@@ -129,13 +132,24 @@ int NetworkConnect(Network *n, char *addr_ip, int port) | |||
ipv4_addr_from_str((ipv4_addr_t *)&remote.addr, _local_ip)) { | |||
remote.port = port; | |||
} | |||
else if (IS_USED(MODULE_IPV6_ADDR)) { | |||
/* ipvN_addr_from_str modifies input buffer */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about that? The char pointer is declared as const
and briefly skimming the code doesn't reveal any bug that results in disregarding the const
ness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right and I also wondered about this but I kept the comment unchanged since I thought it is not directly relevant for my changes and I didn't want to overcomplicate this PR. I will take a look at this again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see a reason not to use ipvN_addr_from_str()
directly. (And I have also found another place where this is already done: tests/lwip/ip.c:116 and tests/lwip/ip.c:130)
I have removed the comment and the strncpy
.
pkg/paho-mqtt/contrib/riot_iface.c
Outdated
@@ -129,13 +132,24 @@ int NetworkConnect(Network *n, char *addr_ip, int port) | |||
ipv4_addr_from_str((ipv4_addr_t *)&remote.addr, _local_ip)) { | |||
remote.port = port; | |||
} | |||
else if (IS_USED(MODULE_IPV6_ADDR)) { | |||
/* ipvN_addr_from_str modifies input buffer */ | |||
strncpy(_local_ip, addr_ip, sizeof(_local_ip)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally not a fan of strncpy
, as it does not guarantee to zero-terminate strings. It should not be needed if ipvN_addr_from_str()
is used directly (which I believe it can), but if I'm wrong about that: How about something like this
size_t addr_ip_len = strlen(addr_ip);
if (addr_ip_len + 1 > sizeof(_local_ip)) {
return -EINVAL;
}
memcpy(_local_ip, addr_ip, addr_ip_len);
_local_ip[addr_ip_len] = [0];
tests/pkg_paho-mqtt/main.c
Outdated
/* wait for test script to listen on broker port */ | ||
ztimer_sleep(ZTIMER_MSEC, 2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the waiting be done on the test script side (that is, wait before sending the shell command from python to the device)?
I'm not sure if scapy tests are ran automatically (due to the root
access required), but even if not that might change in the future. Maybe the timeout can be reduced, as waits when running all tests do add up. We do so in the nightlies or on PRs with CI: run tests
, and the latter blocks the CI (and all PRs) until the tests have finished. But if 2 seconds is reasonable, it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have found a better solution that doesn't require the sleep at all by using select
in the test script.
tests/pkg_paho-mqtt/main.c
Outdated
} | ||
else { | ||
printf("success: published message '%s' to topic '%s' with QoS %d\n", | ||
(char *)message.payload, argv[1], (int)message.qos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(char *)message.payload, argv[1], (int)message.qos); | |
argv[2], argv[1], (int)qos); |
This is a bit shorter and should print the same values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! This code line originates from the examples/paho-mqtt/main.c
which I based the test's main.c
on. Should I also make this change there or should that be a second (very small) PR?
tests/pkg_paho-mqtt/main.c
Outdated
printf("error: not subscribing, topic too long %s\n", argv[1]); | ||
return -1; | ||
} | ||
strncpy(_topic_to_subscribe[topic_cnt], argv[1], strlen(argv[1])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes little sense. It should be at least
strncpy(_topic_to_subscribe[topic_cnt], argv[1], strlen(argv[1])); | |
strncpy(_topic_to_subscribe[topic_cnt], argv[1], MAX_LEN_TOPIC); |
But then again, if strlen()
is called above and the length is checked anyway, one could use the more efficient memcpy()
instead.
It also looks to me like the topics should be zero-terminated, so it should be strlen(argv[1)) + 1 > MAX_LEN_TOPIC
, not strlen(argv[1)) > MAX_LEN_TOPIC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I made both changes (but kept the strncpy
).
Same question as above: This code line originates from the examples/paho-mqtt/main.c
which I based the test's main.c
on. Should I also make this change there or should that be a second (very small) PR?
This sadly doesn't build locally on my machine with a more recent version of GCC:
The issue is that You can fix this e.g. with the following patch: From 8da5839a5fb3af938cd457d4527e5a862d35d414 Mon Sep 17 00:00:00 2001
From: Marian Buschsieweke <marian.buschsieweke@ovgu.de>
Date: Tue, 11 Jan 2022 11:25:52 +0100
Subject: [PATCH] pkg/paho-mqtt: fix memory corruption
---
...001-MQTTClient-C-Fix-memory-corruptions.patch | Bin 0 -> 1984 bytes
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 pkg/paho-mqtt/patches/0001-MQTTClient-C-Fix-memory-corruptions.patch
diff --git a/pkg/paho-mqtt/patches/0001-MQTTClient-C-Fix-memory-corruptions.patch b/pkg/paho-mqtt/patches/0001-MQTTClient-C-Fix-memory-corruptions.patch
new file mode 100644
index 0000000000000000000000000000000000000000..a9eb03d552cc826ae9e97e9969dac7caff911ec0
GIT binary patch
literal 1984
zcmcIkQE%He5PsLMxV;2&EGd#=H;z{a>D*>*fo#p}p%23lP}IrhDv2&h#m>4PzdPEV
zoo;BchYE&8k-Yo9`|cxejBX&!2BTq~%4gYSTBVoO`9P$@YA`DD;RrJvU0z^5z)HZa
zZefNi$SxpF$NmR-nr6Wbr5VGmFiN!WLucjEDqQ0Sgy)T)Op2q)YyGh3k`k|im%`x~
z<{d_mW$>1Q^EA(CJkGQ6c?SJ7OVeQ16@Q>~WBBdei~02R?{IrJpHFLrt&68)xKSHu
zu+iobq&B8oIi*`01oNe`P^k@CP_1>Ml{8u}(I5y|Y1KM3;4}~_wp|0^N?WUn`Vp-9
z6HCx72F0Vp9#9@BtfM*JejH0lAY3n%T=FL2phnsVZp~>+!i7d#7f>`|SGqI68JUm=
z=_T;Of<fEWE&|=TSXXhOySB9Zl7uAM6defS?U$IuQmj_kCc7fn!a|9TG$M+{Bm6j>
zl7>m1q_9?Q3GsXDjA|Ep73&8yRjt=a5XW)wM9DL2<k>;mM8aP%g1&!vZ$@I)2y!Xf
z1(p#!V_DI7wZncdf}BXyxc!cMK~Soyf;e6%2jVDkJpDIP@#XbEwIyz#DDy#CRY@|u
z7>P?Mf!uL19C9gNLkOr0_!`RfHN?a7Ji35BzeYe4++h}lOlYN11->UmfAJl&;X$oa
zkP+FBe8vKqQ)<#m$s1LZ$)c^l?(_^k>Hzqes|L-RR=M!R@v!&((fpJQ;7Yx2Z)dw@
zN*!NG-85vXdk!G0+&7k^f|&Ew*urGX6X_7vWZ-+dXB<*+(N#VPz3<uH5u_0;(EWrP
z_fE(U=-!l}q$Xw&oOtdLgjD-!@8qLqpgzN}829u|SoFV}{3h5Z46|27RS+`D^9iKA
ztx(Knq3I;xM!B7Mye%}6$XRU5ZB-$)g+jR~uX7K+o51YX>GaiX)`PP%zZKjP;UBqR
z<@E@>8wPBH{%~*}(FOI*?C|MO6iY1G08~c0D~yZnTB{Z<oo~C;#zFJAWvR+t7kCfH
zE{vdytz@%J;O^%c<?yZBuN!m^qNX+W?n|Tte6S1E9iALAZ_VJzkbZ~6-be;Txi{5r
z?2Waz3geZ+iTLUeNT`R8quTys-<G-ewr5VQ{hx$)?)^OXm&wCq0$J}!F~Z#|=2w`E
ze~wzl-|W55q1bBaW^`=XZ7hIyslQgx-O~Lo7WkUZc8!J&`{eFuo-C8RblC-Yaz041
GAov@sB4N`2
literal 0
HcmV?d00001
--
2.34.1
The binary thingy is a patch itself (that is actually readable in an editor), but patches in RIOT are configured to be treated as binary. Here is the non-binary version of patch the above patch adds:
|
The CI also found some issues |
The fix to get paho compiling on modern GCC has been merged, please rebase. |
ping @vera |
Actually this looks pretty much alright - but CI appears to have some Kconfig issue? |
char *iface = ipv6_addr_split_iface(addr_ip); | ||
if ((!iface) && (gnrc_netif_numof() == 1)) { | ||
remote.netif = gnrc_netif_iter(NULL)->pid; | ||
} | ||
else if (iface) { | ||
remote.netif = atoi(iface); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use netutils_get_ipv6()
here - or just use sock_tcp_str2ep()
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
Contribution description
This contains:
changes to
riot_iface.c
of pkg/paho-mqtt to allow using with gnrcThe main change is replacing
if (IS_USED(MODULE_LWIP)) {
by#if defined(MODULE_LWIP)
in a few places to allow compiling without "undeclared" errors when not using lwip. This already made it usable.But the MQTT connection would always be lost very quickly because the Paho code closes the MQTT session when
readPacket
returns an error code. In this case,readPacket
returned -ETIMEDOUT because gnrc'ssock_tcp_read
timed out. (The paho_mqtt_riot thread callsMQTTYield
every 30 ms which then continually checks for new packets for 10 ms.readPacket
calls the read function with a timeout equal to the amount of time left of the 10 ms so it doesn't block longer than that.)To fix it, I set the timeout permanently to 0 in
riot_iface.c
sosock_tcp_read
returns immediately with -EAGAIN when there is no data. The EAGAIN error was already handled as an exception (return 0 anyways).Alternatively, you could leave the timeout as-is but also allow the ETIMEDOUT return code – this might be better for performance (?) since
sock_tcp_read
might be called less often during the 10 ms cycle?a new variable
NETWORK_STACK
inexamples/paho-mqtt
to allow switching between lwip and gnrctests for the package
I am submitting this as draft due to some issues I have with getting the tests to pass when using gnrc. I believe the problem is not with my changes directly, but I have not been successful debugging the issue (see below).
Testing procedure
I have tested these changes manually on a nrf52840dk board with my own application code. I had the board running for >12 hours publishing data every 2 minutes with no breaks in the MQTT connection so it seems to be stable.
I have attempted to add automatic tests for the package. I intended the tests to cover only the basic functionality (con, pub, sub, unsub, discon) and no corner or error cases since the Paho code itself already has tests. I think it should be sufficient to test that the interfacing between the Paho code and RIOT functions correctly. (Error cases may be added to check that error codes are passed correctly to RIOT, but I didn't do this here.)
I based the tests on
tests/emcute
and also used a scapy automaton to implement a "fake" MQTT broker for the client to communicate with. The automaton runs through the states CONNECTED -> CLIENT_SUBSCRIBED -> CLIENT_PUBLISHED -> CLIENT_UNSUBSCRIBED -> DISCONNECTED.How to run the tests: see README
Should pass with "SUCCESS".
My issues with the tests (run on native):
NETWORK_STACK=lwip
: worksNETWORK_STACK=gnrc
: does not work. The test socket times out since it doesn't receive a SYN (monitoring the tap0 interface, I also don't see a SYN). I think this issue is related to the fact that the test uses a link-local address for the broker. In my manual tests, I used a global address which worked fine. It seems that with the LL address, it can't find the right interface to use to send out the SYN? There is no interface given, but I thought when includinggnrc_netif_single
it might simply use the only available interface? The weird thing is that there is no error code returned anywhere and it seems like gnrc should have successfully sent the SYN, but I don't see it.With the following dirty "fix" in
riot_iface.c:NetworkConnect
I could get the test to succeed:strncpy(_local_ip, addr_ip, sizeof(_local_ip)); if (IS_USED(MODULE_IPV6_ADDR) && (remote.port == 0) && ipv6_addr_from_str((ipv6_addr_t *)&remote.addr, _local_ip)) { remote.port = port; remote.family = AF_INET6; + remote.netif = 5; }
To summarize, my open issues with this PR are:
Maybe someone could take a look at this who knows the gnrc (TCP) code? @brummer-simon @miri64
Thanks :-)
Issues/PRs references
/