Skip to content

Conversation

@a-gave
Copy link
Contributor

@a-gave a-gave commented Jul 6, 2025

WIP based on the solution proposed by @pony1k in #1121 (comment)

Related to #1192 #1121

@ilario
Copy link
Member

ilario commented Jul 17, 2025

For quickly testing, I used the same setup described in #1197 based on OpenWrt 23.05 and patched the files from this pull request directly on the flashed routers. Maybe I should repeat the tests flashing them with the code already patched?

Extract from lime-config output on YouHua WR1200JS:

network.scandevices found device wan in board.json
network.scandevices found DSA-port lan1 in board.json
network.scandevices found DSA-port lan2 in board.json
network.scandevices found DSA-port lan3 in board.json
network.scandevices found DSA-port lan4 in board.json
[...]
network.scandevices.dev_parser ignored DSA conduit device eth0
Create non existing bridge: br0
Evaluating portlan1
Bridge lan section: cfg030f15
Bridge lan port: bat0
Bridge lan port: lan1
Bridge lan port: lan4
Bridge lan port: lan3
Bridge lan port: lan2
Evaluating portlan2
Bridge lan section: cfg030f15
Bridge lan port: bat0
Bridge lan port: lan1
Bridge lan port: lan4
Bridge lan port: lan3
Bridge lan port: lan2
Evaluating portlan3
Bridge lan section: cfg030f15
Bridge lan port: bat0
Bridge lan port: lan1
Bridge lan port: lan4
Bridge lan port: lan3
Bridge lan port: lan2
Evaluating portlan4
Bridge lan section: cfg030f15
Bridge lan port: bat0
Bridge lan port: lan1
Bridge lan port: lan4
Bridge lan port: lan3
Bridge lan port: lan2
[...]

Extract of lime-config output on PlasmaCloud PA1200:

network.scandevices found DSA-port ethernet2 in board.json
network.scandevices found DSA-port ethernet1 in board.json
[...]
network.scandevices.dev_parser ignored DSA conduit device eth0
Create non existing bridge: br0
Evaluating portethernet1
Bridge lan section: cfg030f15
Bridge lan port: bat0
Bridge lan port: ethernet1
[...]

I got these ping results via cable, to be compared with #1197 (comment):

--- 10.13.0.1 ping statistics ---
5760 packets transmitted, 5752 received, 0.138889% packet loss, time 5900412ms
rtt min/avg/max/mdev = 0.327/0.622/6.524/0.121 ms
--- 4.2.2.2 ping statistics ---
5863 packets transmitted, 5853 received, 0.170561% packet loss, time 5873686ms
rtt min/avg/max/mdev = 28.282/29.697/45.741/1.320 ms

Which are amazing, much better than the ones from #1192 and #1197.


When connecting via cable, I cannot connect via ssh to the YouHua WR1200JS router I am directly connected to...

Here are the lime-reports of the two DSA routers (the one for the YouHua I got connecting to SSH from the PA1200 router):

PlasmaCloud_PA1200-LibreMesh_master-PR1203-OpenWrt_23.05-report.txt

At this boot, I could not connect via ssh to the YouHua WR1200JS router I am directly connected to... So I had to connect from the PA1200 router:

YouHua_WR1200JS-LibreMesh_master-PR1203-OpenWrt_23.05-report.txt

After a reboot I could connect normally:

YouHua_WR1200JS-LibreMesh_master-PR1203-OpenWrt_23.05-report-bis.txt


What is not working for me is the DHCP in the DSA routers: when I connect to the named AP of YouHua WR1200JS or the one of PlasmaCloud PA1200, I am not receiving an IPv4.


Connecting to the named AP of TP-Link WDR3600 I got these ping results:

--- 10.13.0.1 ping statistics ---
2138 packets transmitted, 2138 received, 0% packet loss, time 2140056ms
rtt min/avg/max/mdev = 1.055/2.877/104.738/5.827 ms
--- 4.2.2.2 ping statistics ---
2131 packets transmitted, 2131 received, 0% packet loss, time 2133082ms
rtt min/avg/max/mdev = 29.619/31.884/192.685/6.716 ms

Connecting to the named AP of TP-Link WR841N v13 I got these ping results:

--- 10.13.0.1 ping statistics ---
414 packets transmitted, 397 received, 4.10628% packet loss, time 414094ms
rtt min/avg/max/mdev = 1.312/14.499/385.822/31.384 ms
--- 4.2.2.2 ping statistics ---
420 packets transmitted, 401 received, 4.52381% packet loss, time 420103ms
rtt min/avg/max/mdev = 29.569/44.006/498.742/38.791 ms

which are much better than the ones reported here #1197 (comment)

@pony1k
Copy link
Contributor

pony1k commented Jul 17, 2025

/etc/config/network looks good on first sight, but something must have gone wrong with the bridge configuration. br0 is missing in br-lan on both devices:

PlasmaCloud_PA1200-LibreMesh_master-PR1203-OpenWrt_23.05-report.txt:

### CMD brctl show

bridge name	bridge id		STP enabled	interfaces
br0		7fff.4c1365000f80	no		ethernet1
br-lan		7fff.4c1365000f82	no		bat0
							wlan0-ap
							wlan1-apname
							wlan0-apname
							wlan1-ap

YouHua_WR1200JS-LibreMesh_master-PR1203-OpenWrt_23.05-report-bis.txt:

### CMD brctl show

bridge name	bridge id		STP enabled	interfaces
br-lan		7fff.1e84e5ab5c79	no		bat0
							wlan0-ap
							wlan1-apname
							wlan0-apname
							wlan1-ap
br0		7fff.d45f25eb7eac	no		lan4
							lan2
							lan3
							lan1

This would explain why DHCP does not work on a cabled connection, because dnsmasq is configured to only answer requests from anygw (via br-lan). I don't understand why it doesn't work when connected via wifi.

A bit surprising that you were able to ping into the internet from a cabled connection and receive a reply. I guess this is because of the swconfig devices in the network that were still bridging their ethernet ports to bat0.

While experimenting, I found out that for some reason it doesn't seem to be possbible to add a bridge as direct member to another bridge:

root@tardis:~# ip l set dev bridge0 master br-ap
Error: Can not enslave a bridge to a bridge.

This explains why br0 is not member of br-lan.

What does work, however, is adding a bridge vlan subinterface of a bridge to another bridge. This is what I'm doing in my home setup to work around the fact that one can not run batadv on top of a bridge that contains a batadv mesh interface (usually called bat0). Here is how that looks like:

root@firewall:~# brctl show
bridge name     bridge id               STP enabled     interfaces
br-ap           7fff.2c3afd204850       no              bat0
                                                        br-dsa.3
                                                        phy1-ap
                                                        phy0-ap
br-dsa          7fff.2c3afd204850       no              lan4
                                                        lan2
                                                        lan3
                                                        lan1
br-iot          7fff.66f0ca76eecf       no              bat0.14
                                                        phy0-iot
root@firewall:~# batctl if
br-dsa.4: active
phy0-mesh: active
phy1-mesh: active

Here, bat0 can not be member of br-dsa, because then I couldn't run batadv on top of br-dsa.4.

So we could fix this by adding a bridge vlan to br0 with some arbitrary vlan number x and set all ports to untagged. Then use br0.x everywhere instead of br0. But this only works if the then vlan aware bridge ignores the 1ad tags and forwards them as-is.

edit:
I conducted another experiment and it seems that the above fix could work. The relevant configuration on the Fritz!Box 4020 looks like this:

config device
        option type 'bridge'
        option name 'br-dsa'
        list ports 'lan1'
        list ports 'lan2'
        list ports 'lan3'
        list ports 'lan4'
        
config interface
        option device 'br-dsa'
        option proto 'none'

config bridge-vlan
        option device 'br-dsa'
        option vlan '3'
        list ports 'lan1:u*'
        list ports 'lan3:u*'
        
config device
        option name 'br-ap'
        option type 'bridge'
        list ports 'br-dsa.3'
        list ports 'bat0'

On top of br-dsa.3 I added an 802.1ad vlan subinterface and an IP address:

root@firewall:~# ip l add link br-dsa.3 name br-dsa.3_29 type vlan id 29 protocol 802.1ad
root@firewall:~# ip a add dev br-dsa.3_29 192.168.1.1/24
root@firewall:~# ip l set br-dsa.3_29 up

Then on another swconfig device (TL-WDR3600), I added a switch vlan in /etc/config/network:

config switch_vlan
        option device 'switch0'
        option vlan '3'
        option ports '5 0t'

The two devices are connected via a dumb switch and vlan 3 is untagged. Then I added a normal vlan subinterface and a 802.1ad vlan subinterface on top of that and set another IP adress:

root@tl-wdr3600-v1:~# ip l add link eth0 name eth0.3 type vlan id 3
root@tl-wdr3600-v1:~# ip l add link eth0.3 name eth0.3_29 type vlan id 29 protocol 802.1ad
root@tl-wdr3600-v1:~# ip a add dev eth0.3_29 192.168.1.2/24
root@tl-wdr3600-v1:~# ip l set dev eth0.3 up
root@tl-wdr3600-v1:~# ip l set dev eth0.3_29 up

I was then able to ping 192.168.1.1 from the wdr3600. Pinging 192.168.1.2 from the fb4020 worked only sometimes. It always worked while pingng it from the wdr3600. So I changed the mac addresses on both devices:

root@firewall:~# ip l set dev br-dsa.3_29 address 02:B2:A0:CE:EE:AB
root@tl-wdr3600-v1:~# ip l set dev eth0.3_29 address 02:BF:CB:E9:4C:5B

Then it worked flawlessly in both directions. So the DSA switch ignored the 802.1ad tags, which is good.

@a-gave
Copy link
Contributor Author

a-gave commented Jul 30, 2025

Hi,

while I appreciated a lot the response from @pony, I tried to make it work unsuccessfully.
The fact that a bridge cannot be enslaved to another was not clear to me, and i was thinking that it was actually a member of br-lan, although not shown via brctl show, thanks!
The idea then of creating an untagged 8021q vlan to segment this bridge and finally add it to br-lan seems interesting.

However when i tested it the swconfig device see the dsa one via batctl n, but not the opposite.

While investigating on other solutions I found this workaround:

  1. on the dsa_device setup all dsa user_ports with MAC learning off and create an interface for each of those with proto 'none'
  2. group them under a bridge br-dsa and create on top of it an interface with the same ipv4 as lan (i.e. 10.13.15.35)

This way:

  • nodes can talk each other via cable, i.e. a swconfig device reach the 802.1ad vlans built on the dsa user_port of the device which is connected i.e. lan1_17 and lan1_29
  • clients are routed correctly to anygw via wireless, but via cable the announced gateway is instead the device specific ipv4 (pings and ssh to 10.13.0.1 or thisnode.info sometimes doesn't work and sometimes reach another swconfig device). DNS and routing to internet or to neighbors (host or nodes) works.

here is the patch I'm using: https://github.com/libremesh/lime-packages/commit/202fba11df5a73c4170949a5ab3c440e41368da7.patch

To test this one can apply the patch to a local copy of lime-packages
and then from that directory copy the modified files to the device

Example script to copy the patch to the device #!/bin/bash uscp="scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" ip=10.13.15.35

$uscp -O packages/lime-system/files/usr/lib/lua/lime/proto/lan.lua root@$ip:/usr/lib/lua/lime/proto/
$uscp -O packages/lime-system/files/usr/lib/lua/lime/network.lua root@$ip:/usr/lib/lua/lime/
$uscp -O packages/lime-system/files/usr/lib/lua/lime/firewall.lua root@$ip:/usr/lib/lua/lime/
$uscp -O packages/lime-system/files/usr/lib/lua/lime/utils.lua root@$ip:/usr/lib/lua/lime/
$uscp -O packages/lime-proto-anygw/files/usr/lib/lua/lime/proto/anygw.lua root@$ip:/usr/lib/lua/lime/

then apply and reboot the device

lime-config; lime-apply; /etc/init.d/network reload; reboot

I'm doing futher tests on this to try to make anygw works also via cable, and/or to try to configure mesh protocols on a bridge containing all dsa user_ports (like in the example from @pony). But for now, this patch could be a dirty workaround that provides an alternative to manual configuration.

edit: 31-07
I'm getting ping dropouts (both to 1.1.1.1 and google.com) when connected via wireless to the apname of the dsa device:
Adding this route in /etc/config/network seems to help:

config route
	option interface 'lan'
	option target '10.13.0.0'
	option netmask '255.255.0.0'

@a-gave
Copy link
Contributor Author

a-gave commented Aug 13, 2025

Small update on this:

I'm testing this setup which gave some good results:

  • a dsa device is connected via cable to a swconfig device
  • an host A is connected to the dsa device, another B is connected to the swconfig device:

the dsa device has this configuration:

  • remove all dsa user ports from the bridge br-lan
  • for each dsa user ports create a macvlan with mode 'passthru' (i.e. lan1 -> lan1mac0), that has the same macaddress of the bridge br-lan
  • add those macvlans as member of the bridge br-lan, i.e.
config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'bat0'
	list ports 'lan1mac0'
	list ports 'lan4mac0'

  • add an interface with proto 'none' on top of each macvlans
  • add an ipv4 route with target 10.13.0.0/16 for each macvlan interface, i.e.
config route
	option target '10.13.0.0/16'
	option interface 'lan1mac0_raw'

Here:

  • an host A connected via cable/wireless to the dsa device pings/ssh 10.13.0.1 to the right mesh node (the dsa one)
  • an host B connected via cable/wireless to the swconfig device pings/ssh 10.13.0.1 to the right mesh node (the swconfig one)
  • every host (A and B) connected via cable/wireless (both to the dsa and to the swconfig device) ping outside (1.1.1.1 or google.com) with 0 packets lost
  • an host A connected via cable/wireless to the dsa device pings the other mesh nodes via ipv4, but not the dsa device itself via its own ipaddress (i.e. 10.13.15.35)
  • an host B connected via cable/wireless to the swconfig device does not receive replies to pings directed to the dsa device via its own ip (10.13.15.35) but reach other nodes
  • no loops observed
  • the dsa device does not show in the bridge fdb the stale entry like aa:aa:aa:0d:fe:aa dev lan1 self but it still shows two entry for the anygw macaddress
root@LiMe-870f23:~# bridge fdb | grep aa
aa:aa:aa:0d:fe:aa dev lan1mac0 master br-lan 
aa:aa:aa:0d:fe:aa dev br-lan self permanent
33:33:ff:0d:fe:aa dev br-lan self permanent
33:33:ff:0d:fe:aa dev anygw self permanent

@a-gave
Copy link
Contributor Author

a-gave commented Aug 16, 2025

Some good updates on this.
I was trying to prevent the incorrect entry from appearing in the fdb bridge, using nftables, without luck.

I found that adding an entry in the bridge forwarding database with the anygw macaddress associated to the br-lan interface prevents the malformed entry to appear. However this doesn't make the malformed entry disappear if it is already present, one have to wait that it is removed because it is stale or run bridge fdb flush dev br-lan

  • the problem reoccurs after an `/etc/init.d/network restart'. In this case one can remove it running both these command (also in different order)
bridge fdb flush dev br-lan
bridge fdb add aa:aa:aa:0d:fe:aa dev br-lan
  • using this method means to install a new package ip-bridge on all devices:
  • the best would be probably to include it as a dependency in lime-proto-anygw
  • it is needed only in dsa devices, while harmless in swconfig devices, it consume space of the flash memory and ram
  • doing it only on dsa devices seems not immediate: openwrt doesn't expose i.e. an api with a list of labels linked to the devices/profiles anywere (i.e. outdoor: true/false; switch_type: dsa/swconfig) these information should be collected creating a non-error-proof link via the wiki (toh.json)
  • we should probably found a way to hook those network restarts, or suggest to always reboot the device

What do you think? @G10h4ck @ilario @javierbrk
to test this one can simply add these two lines to /etc/rc.local

bridge fdb flush dev br-lan
bridge fdb add aa:aa:aa:0d:fe:aa dev br-lan

to configure all dsa user ports with 802.1ad babeld and batadv protos
one can comment these lines https://github.com/libremesh/lime-packages/blob/94b468e91d42ec5f62e5aaf8dc09ce255b8be9bc/packages/lime-system/files/usr/lib/lua/lime/network.lua#L390C1-L395C7 in /usr/lib/lua/lime/network.lua

** semi-off-topic: related to this #1170 maybe it could be useful to write down in /etc/config/lime-node the configuration of the specific ethernet interfaces. I.e. in this case a dsa device without specific configuration for ethernet ports must create a section like this for lan1, lan2, lan3, lan4, wan

config net
    option linux_name 'lan1'
    list protocols 'batadv:%N1'
    list protocols 'babeld:17'
    list protocols 'lan'





@ilario
Copy link
Member

ilario commented Aug 21, 2025

I found that adding an entry in the bridge forwarding database with the anygw macaddress associated to the br-lan interface prevents the malformed entry to appear.

Nice!!

  • the problem reoccurs after an `/etc/init.d/network restart'.

Seems that within the LibreMesh code this happens only here: @javierbrk

os.execute("lime-config && /etc/init.d/network restart")

In this case one can remove it running both these command (also in different order)

bridge fdb flush dev br-lan
bridge fdb add aa:aa:aa:0d:fe:aa dev br-lan
  • using this method means to install a new package ip-bridge on all devices:
  • the best would be probably to include it as a dependency in lime-proto-anygw
  • it is needed only in dsa devices, while harmless in swconfig devices, it consume space of the flash memory and ram
  • doing it only on dsa devices seems not immediate: openwrt doesn't expose i.e. an api with a list of labels linked to the devices/profiles anywere (i.e. outdoor: true/false; switch_type: dsa/swconfig) these information should be collected creating a non-error-proof link via the wiki (toh.json)

I think it is ok to include it on all devices. It does not pull any additional dependency and it adds 30 kB, which is not too much.

  • we should probably found a way to hook those network restarts, or suggest to always reboot the device

Maybe using this hotpulg net thing?
https://openwrt.org/docs/guide-user/base-system/hotplug#net

What do you think? @G10h4ck @ilario @javierbrk to test this one can simply add these two lines to /etc/rc.local

bridge fdb flush dev br-lan
bridge fdb add aa:aa:aa:0d:fe:aa dev br-lan

Still have to find the time for testing, sorry.

to configure all dsa user ports with 802.1ad babeld and batadv protos one can comment these lines https://github.com/libremesh/lime-packages/blob/94b468e91d42ec5f62e5aaf8dc09ce255b8be9bc/packages/lime-system/files/usr/lib/lua/lime/network.lua#L390C1-L395C7 in /usr/lib/lua/lime/network.lua

Wait, I don't understand... That line is for fixing another issue, no? Or adding this bridge entry fixes both issues?? Like, it fixes the dsa-dsa cabled router ping issue and also allows us to use the wired interfaces both for lan and for mesh?

** semi-off-topic: related to this #1170 maybe it could be useful to write down in /etc/config/lime-node the configuration of the specific ethernet interfaces. I.e. in this case a dsa device without specific configuration for ethernet ports must create a section like this for lan1, lan2, lan3, lan4, wan

config net
    option linux_name 'lan1'
    list protocols 'batadv:%N1'
    list protocols 'babeld:17'
    list protocols 'lan'

Can you rephrase the proposal? Is that LibreMesh should pre-fill the lime-node file with the specific configuration for all wired interfaces on dsa routers?

@a-gave
Copy link
Contributor Author

a-gave commented Aug 27, 2025

replaced by #1214

@a-gave a-gave closed this Aug 27, 2025
@a-gave
Copy link
Contributor Author

a-gave commented Aug 27, 2025

to configure all dsa user ports with 802.1ad babeld and batadv protos one can comment these lines https://github.com/libremesh/lime-packages/blob/94b468e91d42ec5f62e5aaf8dc09ce255b8be9bc/packages/lime-system/files/usr/lib/lua/lime/network.lua#L390C1-L395C7 in /usr/lib/lua/lime/network.lua

Wait, I don't understand... That line is for fixing another issue, no? Or adding this bridge entry fixes both issues?? Like, it fixes the dsa-dsa cabled router ping issue and also allows us to use the wired interfaces both for lan and for mesh?

An host can ping the anygw of a dsa device if no other libremesh node is connected via cable, with the bridge fdb fix and the nftables rule anygw restarts to work correctly via cable for hosts connected via cable.

I'm not sure about #1121. I think it partially got solved by discontinuing the configuration of batadv on eth0. In the tests of this fix I'm connecting dsa-dsa dsa-swconfig dsa-hosts devices via cable without noticing errors or loops. So I would say yes!

Can you rephrase the proposal? Is that LibreMesh should pre-fill the lime-node file with the specific configuration for all wired interfaces on dsa routers?

Yes, but maybe also on swconfig devices, pre-separating each port with vlans eth0.2, it could be useful.
I think for example that it is still convenient a manual configuration for a sort of tuning of configurations, if only clients are connected to an ethernet port one can decide to configure that port with only the lan protocol, to avoid unnecessary network usage from the two protocols (i.e. shared-state try to post to /cgi-bin/shared-state or something similar also to clients)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants