-
-
Notifications
You must be signed in to change notification settings - Fork 113
Description
Continuing #1118 here because I'm not sure if trying to detect switches between routers is the right way to go forward.
Let's discuss the issue described by @ilario in this mail:
https://lists.autistici.org/message/20240714.140352.58fe57b2.en.html
In default configuration, ethernet interfaces are added to the br-lan bridge, while also being configured as batadv hard interface. In some setups, this leads to error messages appearing in the kernel log in a high rate and network instability.
It was suspected that the error appears iff there is a switch between the two routers. I tried to reproduce the issue with a dumb switch, but without success (everything working fine, no errors in kernel log), as described in this mail:
https://lists.autistici.org/message/20240726.150840.dcc0e028.en.html
I then tried to reprduce it by replacing the switch with an OpenWrt-router (without DSA), basicly acting as a managable switch, with no sucess either.
Then when I connected the two LibreMesh routers directly, suprisingly I could observe the issue. I could observe the error messages in the kernel logs and batadv didn't mesh over ethernet. On mr70x-v1, batctl n did not list the fritz4040 as neighbour on the lan interfaces, also batctl bbt showed no routers in the backbone table on both routers. batctl tcpdump lan1_29 could see batman OGMs appearing on the lan1_29, not sure why the interface was not showing up in the neighbour table. After a while, the wifi connection between my laptop and the routers became quite unusable. When I ran tcpdump on the mesh interfaces I found that there was a lot of broadcast and some frames were duplicated many times (I saw ICMPv6 messages with same id and seq-no many times over long time periods. Plus, on my laptop, I saw the same echo request being received over and over at a high rate. So there was a loop and that clogged the wifi interface.
It is not the kind of loop I described in #1032 .
Later I booted the routers again, to further investigate the issue. Annoyingly, everything is working fine now. No kernel logs, meshing over ethernet works, no frames looping around. I'm not able to replicate the issue again. I also tried with resetting the configuration to firstboot state, but to no avail. So, unfortunatly, it is currently not possible for me to find out when excactly this happens and why.
I find it strange that batman is also configured on eth0 on dsa enabled devices. I don't think we are supposed to use that directly. Next time someone observes this issue, maybe they could add
config net
option linux_name 'eth0'
list protocols 'manual'
to the lime-node file and see if it helps.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status