Commit 7aad9a0
net: dsa: properly keep track of conduit reference
Problem description
-------------------
DSA has a mumbo-jumbo of reference handling of the conduit net device
and its kobject which, sadly, is just wrong and doesn't make sense.
There are two distinct problems.
1. The OF path, which uses of_find_net_device_by_node(), never releases
the elevated refcount on the conduit's kobject. Nominally, the OF and
non-OF paths should result in objects having identical reference
counts taken, and it is already suspicious that
dsa_dev_to_net_device() has a put_device() call which is missing in
dsa_port_parse_of(), but we can actually even verify that an issue
exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run this command
"before" and "after" applying this patch:
(unbind the conduit driver for net device eno2)
echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
we see these lines in the output diff which appear only with the patch
applied:
kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000)
kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000)
2. After we find the conduit interface one way (OF) or another (non-OF),
it can get unregistered at any time, and DSA remains with a long-lived,
but in this case stale, cpu_dp->conduit pointer. Holding the net
device's underlying kobject isn't actually of much help, it just
prevents it from being freed (but we never need that kobject
directly). What helps us to prevent the net device from being
unregistered is the parallel netdev reference mechanism (dev_hold()
and dev_put()).
Actually we actually use that netdev tracker mechanism implicitly on
user ports since commit 2f1e8ea ("net: dsa: link interfaces with
the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link().
But time still passes at DSA switch probe time between the initial
of_find_net_device_by_node() code and the user port creation time, time
during which the conduit could unregister itself and DSA wouldn't know
about it.
So we have to run of_find_net_device_by_node() under rtnl_lock() to
prevent that from happening, and release the lock only with the netdev
tracker having acquired the reference.
Do we need to keep the reference until dsa_unregister_switch() /
dsa_switch_shutdown()?
1: Maybe yes. A switch device will still be registered even if all user
ports failed to probe, see commit 86f8b1c ("net: dsa: Do not
make user port errors fatal"), and the cpu_dp->conduit pointers
remain valid. I haven't audited all call paths to see whether they
will actually use the conduit in lack of any user port, but if they
do, it seems safer to not rely on user ports for that reference.
2. Definitely yes. We support changing the conduit which a user port is
associated to, and we can get into a situation where we've moved all
user ports away from a conduit, thus no longer hold any reference to
it via the net device tracker. But we shouldn't let it go nonetheless
- see the next change in relation to dsa_tree_find_first_conduit()
and LAG conduits which disappear.
We have to be prepared to return to the physical conduit, so the CPU
port must explicitly keep another reference to it. This is also to
say: the user ports and their CPU ports may not always keep a
reference to the same conduit net device, and both are needed.
As for the conduit's kobject for the /sys/class/net/ entry, we don't
care about it, we can release it as soon as we hold the net device
object itself.
History and blame attribution
-----------------------------
The code has been refactored so many times, it is very difficult to
follow and properly attribute a blame, but I'll try to make a short
history which I hope to be correct.
We have two distinct probing paths:
- one for OF, introduced in 2016 in commit 83c0afa ("net: dsa: Add
new binding implementation")
- one for non-OF, introduced in 2017 in commit 71e0bbd ("net: dsa:
Add support for platform data")
These are both complete rewrites of the original probing paths (which
used struct dsa_switch_driver and other weird stuff, instead of regular
devices on their respective buses for register access, like MDIO, SPI,
I2C etc):
- one for OF, introduced in 2013 in commit 5e95329 ("dsa: add
device tree bindings to register DSA switches")
- one for non-OF, introduced in 2008 in commit 91da11f ("net:
Distributed Switch Architecture protocol support")
except for tiny bits and pieces like dsa_dev_to_net_device() which were
seemingly carried over since the original commit, and used to this day.
The point is that the original probing paths received a fix in 2015 in
the form of commit 679fb46 ("net: dsa: Add missing master netdev
dev_put() calls"), but the fix never made it into the "new" (dsa2)
probing paths that can still be traced to today, and the fixed probing
path was later deleted in 2019 in commit 93e86b3 ("net: dsa: Remove
legacy probing support").
That is to say, the new probing paths were never quite correct in this
area.
The existence of the legacy probing support which was deleted in 2019
explains why dsa_dev_to_net_device() returns a conduit with elevated
refcount (because it was supposed to be released during
dsa_remove_dst()). After the removal of the legacy code, the only user
of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this
function returns. This pattern makes no sense today, and can only be
interpreted historically to understand why dev_hold() was there in the
first place.
Change details
--------------
Today we have a better netdev tracking infrastructure which we should
use. Logically netdev_hold() belongs in common code
(dsa_port_parse_cpu(), where dp->conduit is assigned), but there is a
tradeoff to be made with the rtnl_lock() section which would become a
bit too long if we did that - dsa_port_parse_cpu() also calls
request_module(). So we duplicate a bit of logic in order for the
callers of dsa_port_parse_cpu() to be the ones responsible of holding
the conduit reference and releasing it on error. This shortens the
rtnl_lock() section significantly.
In the dsa_switch_probe() error path, dsa_switch_release_ports() will be
called in a number of situations, one being where dsa_port_parse_cpu()
maybe didn't get the chance to run at all (a different port failed
earlier, etc). So we have to test for the conduit being NULL prior to
calling netdev_put().
There have still been so many transformations to the code since the
blamed commits (rename master -> conduit, commit 0650bf5 ("net:
dsa: be compatible with masters which unregister on shutdown")), that it
only makes sense to fix the code using the best methods available today
and see how it can be backported to stable later. I suspect the fix
cannot even be backported to kernels which lack dsa_switch_shutdown(),
and I suspect this is also maybe why the long-lived conduit reference
didn't make it into the new DSA probing paths at the time (problems
during shutdown).
Because dsa_dev_to_net_device() has a single call site and has to be
changed anyway, the logic was just absorbed into the non-OF
dsa_port_parse().
Tested on the ocelot/felix switch and on dsa_loop, both on the NXP
LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y.
Reported-by: Ma Ke <make24@iscas.ac.cn>
Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/
Fixes: 83c0afa ("net: dsa: Add new binding implementation")
Fixes: 71e0bbd ("net: dsa: Add support for platform data")
Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NipaLocal <nipa@local>1 parent 47b1c9a commit 7aad9a0
2 files changed
+35
-25
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
302 | 302 | | |
303 | 303 | | |
304 | 304 | | |
| 305 | + | |
305 | 306 | | |
306 | 307 | | |
307 | 308 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1253 | 1253 | | |
1254 | 1254 | | |
1255 | 1255 | | |
| 1256 | + | |
1256 | 1257 | | |
| 1258 | + | |
1257 | 1259 | | |
1258 | 1260 | | |
1259 | | - | |
| 1261 | + | |
| 1262 | + | |
1260 | 1263 | | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
1261 | 1269 | | |
1262 | 1270 | | |
1263 | | - | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
1264 | 1275 | | |
1265 | 1276 | | |
1266 | 1277 | | |
| |||
1393 | 1404 | | |
1394 | 1405 | | |
1395 | 1406 | | |
1396 | | - | |
1397 | | - | |
1398 | | - | |
1399 | | - | |
1400 | | - | |
1401 | | - | |
1402 | | - | |
1403 | | - | |
1404 | | - | |
1405 | | - | |
1406 | | - | |
1407 | | - | |
1408 | | - | |
1409 | | - | |
1410 | | - | |
1411 | | - | |
1412 | | - | |
1413 | | - | |
1414 | 1407 | | |
1415 | 1408 | | |
1416 | 1409 | | |
1417 | 1410 | | |
1418 | 1411 | | |
| 1412 | + | |
| 1413 | + | |
1419 | 1414 | | |
1420 | | - | |
1421 | | - | |
| 1415 | + | |
| 1416 | + | |
| 1417 | + | |
| 1418 | + | |
1422 | 1419 | | |
| 1420 | + | |
1423 | 1421 | | |
1424 | | - | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
1425 | 1426 | | |
1426 | | - | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
| 1430 | + | |
1427 | 1431 | | |
1428 | 1432 | | |
1429 | 1433 | | |
| |||
1491 | 1495 | | |
1492 | 1496 | | |
1493 | 1497 | | |
| 1498 | + | |
| 1499 | + | |
| 1500 | + | |
1494 | 1501 | | |
1495 | 1502 | | |
1496 | 1503 | | |
| |||
1635 | 1642 | | |
1636 | 1643 | | |
1637 | 1644 | | |
1638 | | - | |
| 1645 | + | |
1639 | 1646 | | |
| 1647 | + | |
| 1648 | + | |
1640 | 1649 | | |
1641 | 1650 | | |
1642 | 1651 | | |
| |||
0 commit comments