Commit 052ac41
net: phy: transfer phy_config_inband() locking responsibility to phylink
[ Upstream commit e2a10da ]
Problem description
===================
Lockdep reports a possible circular locking dependency (AB/BA) between
&pl->state_mutex and &phy->lock, as follows.
phylink_resolve() // acquires &pl->state_mutex
-> phylink_major_config()
-> phy_config_inband() // acquires &pl->phydev->lock
whereas all the other call sites where &pl->state_mutex and
&pl->phydev->lock have the locking scheme reversed. Everywhere else,
&pl->phydev->lock is acquired at the top level, and &pl->state_mutex at
the lower level. A clear example is phylink_bringup_phy().
The outlier is the newly introduced phy_config_inband() and the existing
lock order is the correct one. To understand why it cannot be the other
way around, it is sufficient to consider phylink_phy_change(), phylink's
callback from the PHY device's phy->phy_link_change() virtual method,
invoked by the PHY state machine.
phy_link_up() and phy_link_down(), the (indirect) callers of
phylink_phy_change(), are called with &phydev->lock acquired.
Then phylink_phy_change() acquires its own &pl->state_mutex, to
serialize changes made to its pl->phy_state and pl->link_config.
So all other instances of &pl->state_mutex and &phydev->lock must be
consistent with this order.
Problem impact
==============
I think the kernel runs a serious deadlock risk if an existing
phylink_resolve() thread, which results in a phy_config_inband() call,
is concurrent with a phy_link_up() or phy_link_down() call, which will
deadlock on &pl->state_mutex in phylink_phy_change(). Practically
speaking, the impact may be limited by the slow speed of the medium
auto-negotiation protocol, which makes it unlikely for the current state
to still be unresolved when a new one is detected, but I think the
problem is there. Nonetheless, the problem was discovered using lockdep.
Proposed solution
=================
Practically speaking, the phy_config_inband() requirement of having
phydev->lock acquired must transfer to the caller (phylink is the only
caller). There, it must bubble up until immediately before
&pl->state_mutex is acquired, for the cases where that takes place.
Solution details, considerations, notes
=======================================
This is the phy_config_inband() call graph:
sfp_upstream_ops :: connect_phy()
|
v
phylink_sfp_connect_phy()
|
v
phylink_sfp_config_phy()
|
| sfp_upstream_ops :: module_insert()
| |
| v
| phylink_sfp_module_insert()
| |
| | sfp_upstream_ops :: module_start()
| | |
| | v
| | phylink_sfp_module_start()
| | |
| v v
| phylink_sfp_config_optical()
phylink_start() | |
| phylink_resume() v v
| | phylink_sfp_set_config()
| | |
v v v
phylink_mac_initial_config()
| phylink_resolve()
| | phylink_ethtool_ksettings_set()
v v v
phylink_major_config()
|
v
phy_config_inband()
phylink_major_config() caller #1, phylink_mac_initial_config(), does not
acquire &pl->state_mutex nor do its callers. It must acquire
&pl->phydev->lock prior to calling phylink_major_config().
phylink_major_config() caller #2, phylink_resolve() acquires
&pl->state_mutex, thus also needs to acquire &pl->phydev->lock.
phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is
completely uninteresting, because it only calls phylink_major_config()
if pl->phydev is NULL (otherwise it calls phy_ethtool_ksettings_set()).
We need to change nothing there.
Other solutions
===============
The lock inversion between &pl->state_mutex and &pl->phydev->lock has
occurred at least once before, as seen in commit c718af2 ("net:
phylink: fix ethtool -A with attached PHYs"). The solution there was to
simply not call phy_set_asym_pause() under the &pl->state_mutex. That
cannot be extended to our case though, where the phy_config_inband()
call is much deeper inside the &pl->state_mutex section.
Fixes: 5fd0f1a ("net: phylink: add negotiation of in-band capabilities")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250904125238.193990-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>1 parent 56fe63b commit 052ac41
2 files changed
+13
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1065 | 1065 | | |
1066 | 1066 | | |
1067 | 1067 | | |
1068 | | - | |
| 1068 | + | |
1069 | 1069 | | |
1070 | 1070 | | |
1071 | 1071 | | |
1072 | 1072 | | |
1073 | 1073 | | |
1074 | 1074 | | |
1075 | | - | |
1076 | 1075 | | |
1077 | | - | |
| 1076 | + | |
1078 | 1077 | | |
1079 | | - | |
1080 | | - | |
1081 | | - | |
1082 | | - | |
| 1078 | + | |
1083 | 1079 | | |
1084 | | - | |
| 1080 | + | |
1085 | 1081 | | |
1086 | 1082 | | |
1087 | 1083 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1411 | 1411 | | |
1412 | 1412 | | |
1413 | 1413 | | |
| 1414 | + | |
1414 | 1415 | | |
1415 | 1416 | | |
1416 | 1417 | | |
| |||
1434 | 1435 | | |
1435 | 1436 | | |
1436 | 1437 | | |
| 1438 | + | |
| 1439 | + | |
1437 | 1440 | | |
| 1441 | + | |
| 1442 | + | |
1438 | 1443 | | |
1439 | 1444 | | |
1440 | 1445 | | |
| |||
1575 | 1580 | | |
1576 | 1581 | | |
1577 | 1582 | | |
| 1583 | + | |
| 1584 | + | |
1578 | 1585 | | |
1579 | 1586 | | |
1580 | 1587 | | |
| |||
1676 | 1683 | | |
1677 | 1684 | | |
1678 | 1685 | | |
| 1686 | + | |
| 1687 | + | |
1679 | 1688 | | |
1680 | 1689 | | |
1681 | 1690 | | |
| |||
0 commit comments