Skip to content

Commit 7aad9a0

Browse files
vladimirolteanNipaLocal
authored andcommitted
net: dsa: properly keep track of conduit reference
Problem description ------------------- DSA has a mumbo-jumbo of reference handling of the conduit net device and its kobject which, sadly, is just wrong and doesn't make sense. There are two distinct problems. 1. The OF path, which uses of_find_net_device_by_node(), never releases the elevated refcount on the conduit's kobject. Nominally, the OF and non-OF paths should result in objects having identical reference counts taken, and it is already suspicious that dsa_dev_to_net_device() has a put_device() call which is missing in dsa_port_parse_of(), but we can actually even verify that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run this command "before" and "after" applying this patch: (unbind the conduit driver for net device eno2) echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind we see these lines in the output diff which appear only with the patch applied: kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000) kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000) 2. After we find the conduit interface one way (OF) or another (non-OF), it can get unregistered at any time, and DSA remains with a long-lived, but in this case stale, cpu_dp->conduit pointer. Holding the net device's underlying kobject isn't actually of much help, it just prevents it from being freed (but we never need that kobject directly). What helps us to prevent the net device from being unregistered is the parallel netdev reference mechanism (dev_hold() and dev_put()). Actually we actually use that netdev tracker mechanism implicitly on user ports since commit 2f1e8ea ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link(). But time still passes at DSA switch probe time between the initial of_find_net_device_by_node() code and the user port creation time, time during which the conduit could unregister itself and DSA wouldn't know about it. So we have to run of_find_net_device_by_node() under rtnl_lock() to prevent that from happening, and release the lock only with the netdev tracker having acquired the reference. Do we need to keep the reference until dsa_unregister_switch() / dsa_switch_shutdown()? 1: Maybe yes. A switch device will still be registered even if all user ports failed to probe, see commit 86f8b1c ("net: dsa: Do not make user port errors fatal"), and the cpu_dp->conduit pointers remain valid. I haven't audited all call paths to see whether they will actually use the conduit in lack of any user port, but if they do, it seems safer to not rely on user ports for that reference. 2. Definitely yes. We support changing the conduit which a user port is associated to, and we can get into a situation where we've moved all user ports away from a conduit, thus no longer hold any reference to it via the net device tracker. But we shouldn't let it go nonetheless - see the next change in relation to dsa_tree_find_first_conduit() and LAG conduits which disappear. We have to be prepared to return to the physical conduit, so the CPU port must explicitly keep another reference to it. This is also to say: the user ports and their CPU ports may not always keep a reference to the same conduit net device, and both are needed. As for the conduit's kobject for the /sys/class/net/ entry, we don't care about it, we can release it as soon as we hold the net device object itself. History and blame attribution ----------------------------- The code has been refactored so many times, it is very difficult to follow and properly attribute a blame, but I'll try to make a short history which I hope to be correct. We have two distinct probing paths: - one for OF, introduced in 2016 in commit 83c0afa ("net: dsa: Add new binding implementation") - one for non-OF, introduced in 2017 in commit 71e0bbd ("net: dsa: Add support for platform data") These are both complete rewrites of the original probing paths (which used struct dsa_switch_driver and other weird stuff, instead of regular devices on their respective buses for register access, like MDIO, SPI, I2C etc): - one for OF, introduced in 2013 in commit 5e95329 ("dsa: add device tree bindings to register DSA switches") - one for non-OF, introduced in 2008 in commit 91da11f ("net: Distributed Switch Architecture protocol support") except for tiny bits and pieces like dsa_dev_to_net_device() which were seemingly carried over since the original commit, and used to this day. The point is that the original probing paths received a fix in 2015 in the form of commit 679fb46 ("net: dsa: Add missing master netdev dev_put() calls"), but the fix never made it into the "new" (dsa2) probing paths that can still be traced to today, and the fixed probing path was later deleted in 2019 in commit 93e86b3 ("net: dsa: Remove legacy probing support"). That is to say, the new probing paths were never quite correct in this area. The existence of the legacy probing support which was deleted in 2019 explains why dsa_dev_to_net_device() returns a conduit with elevated refcount (because it was supposed to be released during dsa_remove_dst()). After the removal of the legacy code, the only user of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this function returns. This pattern makes no sense today, and can only be interpreted historically to understand why dev_hold() was there in the first place. Change details -------------- Today we have a better netdev tracking infrastructure which we should use. Logically netdev_hold() belongs in common code (dsa_port_parse_cpu(), where dp->conduit is assigned), but there is a tradeoff to be made with the rtnl_lock() section which would become a bit too long if we did that - dsa_port_parse_cpu() also calls request_module(). So we duplicate a bit of logic in order for the callers of dsa_port_parse_cpu() to be the ones responsible of holding the conduit reference and releasing it on error. This shortens the rtnl_lock() section significantly. In the dsa_switch_probe() error path, dsa_switch_release_ports() will be called in a number of situations, one being where dsa_port_parse_cpu() maybe didn't get the chance to run at all (a different port failed earlier, etc). So we have to test for the conduit being NULL prior to calling netdev_put(). There have still been so many transformations to the code since the blamed commits (rename master -> conduit, commit 0650bf5 ("net: dsa: be compatible with masters which unregister on shutdown")), that it only makes sense to fix the code using the best methods available today and see how it can be backported to stable later. I suspect the fix cannot even be backported to kernels which lack dsa_switch_shutdown(), and I suspect this is also maybe why the long-lived conduit reference didn't make it into the new DSA probing paths at the time (problems during shutdown). Because dsa_dev_to_net_device() has a single call site and has to be changed anyway, the logic was just absorbed into the non-OF dsa_port_parse(). Tested on the ocelot/felix switch and on dsa_loop, both on the NXP LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y. Reported-by: Ma Ke <make24@iscas.ac.cn> Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/ Fixes: 83c0afa ("net: dsa: Add new binding implementation") Fixes: 71e0bbd ("net: dsa: Add support for platform data") Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NipaLocal <nipa@local>
1 parent 47b1c9a commit 7aad9a0

File tree

2 files changed

+35
-25
lines changed

2 files changed

+35
-25
lines changed

include/net/dsa.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,7 @@ struct dsa_port {
302302
struct devlink_port devlink_port;
303303
struct phylink *pl;
304304
struct phylink_config pl_config;
305+
netdevice_tracker conduit_tracker;
305306
struct dsa_lag *lag;
306307
struct net_device *hsr_dev;
307308

net/dsa/dsa.c

Lines changed: 34 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1253,14 +1253,25 @@ static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
12531253
if (ethernet) {
12541254
struct net_device *conduit;
12551255
const char *user_protocol;
1256+
int err;
12561257

1258+
rtnl_lock();
12571259
conduit = of_find_net_device_by_node(ethernet);
12581260
of_node_put(ethernet);
1259-
if (!conduit)
1261+
if (!conduit) {
1262+
rtnl_unlock();
12601263
return -EPROBE_DEFER;
1264+
}
1265+
1266+
netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL);
1267+
put_device(&conduit->dev);
1268+
rtnl_unlock();
12611269

12621270
user_protocol = of_get_property(dn, "dsa-tag-protocol", NULL);
1263-
return dsa_port_parse_cpu(dp, conduit, user_protocol);
1271+
err = dsa_port_parse_cpu(dp, conduit, user_protocol);
1272+
if (err)
1273+
netdev_put(conduit, &dp->conduit_tracker);
1274+
return err;
12641275
}
12651276

12661277
if (link)
@@ -1393,37 +1404,30 @@ static struct device *dev_find_class(struct device *parent, char *class)
13931404
return device_find_child(parent, class, dev_is_class);
13941405
}
13951406

1396-
static struct net_device *dsa_dev_to_net_device(struct device *dev)
1397-
{
1398-
struct device *d;
1399-
1400-
d = dev_find_class(dev, "net");
1401-
if (d != NULL) {
1402-
struct net_device *nd;
1403-
1404-
nd = to_net_dev(d);
1405-
dev_hold(nd);
1406-
put_device(d);
1407-
1408-
return nd;
1409-
}
1410-
1411-
return NULL;
1412-
}
1413-
14141407
static int dsa_port_parse(struct dsa_port *dp, const char *name,
14151408
struct device *dev)
14161409
{
14171410
if (!strcmp(name, "cpu")) {
14181411
struct net_device *conduit;
1412+
struct device *d;
1413+
int err;
14191414

1420-
conduit = dsa_dev_to_net_device(dev);
1421-
if (!conduit)
1415+
rtnl_lock();
1416+
d = dev_find_class(dev, "net");
1417+
if (!d) {
1418+
rtnl_unlock();
14221419
return -EPROBE_DEFER;
1420+
}
14231421

1424-
dev_put(conduit);
1422+
conduit = to_net_dev(d);
1423+
netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL);
1424+
put_device(d);
1425+
rtnl_unlock();
14251426

1426-
return dsa_port_parse_cpu(dp, conduit, NULL);
1427+
err = dsa_port_parse_cpu(dp, conduit, NULL);
1428+
if (err)
1429+
netdev_put(conduit, &dp->conduit_tracker);
1430+
return err;
14271431
}
14281432

14291433
if (!strcmp(name, "dsa"))
@@ -1491,6 +1495,9 @@ static void dsa_switch_release_ports(struct dsa_switch *ds)
14911495
struct dsa_vlan *v, *n;
14921496

14931497
dsa_switch_for_each_port_safe(dp, next, ds) {
1498+
if (dsa_port_is_cpu(dp) && dp->conduit)
1499+
netdev_put(dp->conduit, &dp->conduit_tracker);
1500+
14941501
/* These are either entries that upper layers lost track of
14951502
* (probably due to bugs), or installed through interfaces
14961503
* where one does not necessarily have to remove them, like
@@ -1635,8 +1642,10 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
16351642
/* Disconnect from further netdevice notifiers on the conduit,
16361643
* since netdev_uses_dsa() will now return false.
16371644
*/
1638-
dsa_switch_for_each_cpu_port(dp, ds)
1645+
dsa_switch_for_each_cpu_port(dp, ds) {
16391646
dp->conduit->dsa_ptr = NULL;
1647+
netdev_put(dp->conduit, &dp->conduit_tracker);
1648+
}
16401649

16411650
rtnl_unlock();
16421651
out:

0 commit comments

Comments
 (0)