Skip to content

Commit cee6c9e

Browse files
committed
iavf: fix circular lock dependency with netdev_lock
JIRA: https://issues.redhat.com/browse/RHEL-87382 commit c6124f6 Author: Jacob Keller <jacob.e.keller@intel.com> Date: Mon Feb 24 11:06:44 2025 -0800 iavf: fix circular lock dependency with netdev_lock We have recently seen reports of lockdep circular lock dependency warnings when loading the iAVF driver: [ 1504.790308] ====================================================== [ 1504.790309] WARNING: possible circular locking dependency detected [ 1504.790310] 6.13.0 #net_next_rt.c2933b2befe2.el9 Not tainted [ 1504.790311] ------------------------------------------------------ [ 1504.790312] kworker/u128:0/13566 is trying to acquire lock: [ 1504.790313] ffff97d0e4738f18 (&dev->lock){+.+.}-{4:4}, at: register_netdevice+0x52c/0x710 [ 1504.790320] [ 1504.790320] but task is already holding lock: [ 1504.790321] ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf] [ 1504.790330] [ 1504.790330] which lock already depends on the new lock. [ 1504.790330] [ 1504.790330] [ 1504.790330] the existing dependency chain (in reverse order) is: [ 1504.790331] [ 1504.790331] -> #1 (&adapter->crit_lock){+.+.}-{4:4}: [ 1504.790333] __lock_acquire+0x52d/0xbb0 [ 1504.790337] lock_acquire+0xd9/0x330 [ 1504.790338] mutex_lock_nested+0x4b/0xb0 [ 1504.790341] iavf_finish_config+0x37/0x240 [iavf] [ 1504.790347] process_one_work+0x248/0x6d0 [ 1504.790350] worker_thread+0x18d/0x330 [ 1504.790352] kthread+0x10e/0x250 [ 1504.790354] ret_from_fork+0x30/0x50 [ 1504.790357] ret_from_fork_asm+0x1a/0x30 [ 1504.790361] [ 1504.790361] -> #0 (&dev->lock){+.+.}-{4:4}: [ 1504.790364] check_prev_add+0xf1/0xce0 [ 1504.790366] validate_chain+0x46a/0x570 [ 1504.790368] __lock_acquire+0x52d/0xbb0 [ 1504.790370] lock_acquire+0xd9/0x330 [ 1504.790371] mutex_lock_nested+0x4b/0xb0 [ 1504.790372] register_netdevice+0x52c/0x710 [ 1504.790374] iavf_finish_config+0xfa/0x240 [iavf] [ 1504.790379] process_one_work+0x248/0x6d0 [ 1504.790381] worker_thread+0x18d/0x330 [ 1504.790383] kthread+0x10e/0x250 [ 1504.790385] ret_from_fork+0x30/0x50 [ 1504.790387] ret_from_fork_asm+0x1a/0x30 [ 1504.790389] [ 1504.790389] other info that might help us debug this: [ 1504.790389] [ 1504.790389] Possible unsafe locking scenario: [ 1504.790389] [ 1504.790390] CPU0 CPU1 [ 1504.790391] ---- ---- [ 1504.790391] lock(&adapter->crit_lock); [ 1504.790393] lock(&dev->lock); [ 1504.790394] lock(&adapter->crit_lock); [ 1504.790395] lock(&dev->lock); [ 1504.790397] [ 1504.790397] *** DEADLOCK *** This appears to be caused by the change in commit 5fda3f3 ("net: make netdev_lock() protect netdev->reg_state"), which added a netdev_lock() in register_netdevice. The iAVF driver calls register_netdevice() from iavf_finish_config(), as a final stage of its state machine post-probe. It currently takes the RTNL lock, then the netdev lock, and then the device critical lock. This pattern is used throughout the driver. Thus there is a strong dependency that the crit_lock should not be acquired before the net device lock. The change to register_netdevice creates an ABBA lock order violation because the iAVF driver is holding the crit_lock while calling register_netdevice, which then takes the netdev_lock. It seems likely that future refactors could result in netdev APIs which hold the netdev_lock while calling into the driver. This means that we should not re-order the locks so that netdev_lock is acquired after the device private crit_lock. Instead, notice that we already release the netdev_lock prior to calling the register_netdevice. This flow only happens during the early driver initialization as we transition through the __IAVF_STARTUP, __IAVF_INIT_VERSION_CHECK, __IAVF_INIT_GET_RESOURCES, etc. Analyzing the places where we take crit_lock in the driver there are two sources: a) several of the work queue tasks including adminq_task, watchdog_task, reset_task, and the finish_config task. b) various callbacks which ultimately stem back to .ndo operations or ethtool operations. The latter cannot be triggered until after the netdevice registration is completed successfully. The iAVF driver uses alloc_ordered_workqueue, which is an unbound workqueue that has a max limit of 1, and thus guarantees that only a single work item on the queue is executing at any given time, so none of the other work threads could be executing due to the ordered workqueue guarantees. The iavf_finish_config() function also does not do anything else after register_netdevice, unless it fails. It seems unlikely that the driver private crit_lock is protecting anything that register_netdevice() itself touches. Thus, to fix this ABBA lock violation, lets simply release the adapter->crit_lock as well as netdev_lock prior to calling register_netdevice(). We do still keep holding the RTNL lock as required by the function. If we do fail to register the netdevice, then we re-acquire the adapter critical lock to finish the transition back to __IAVF_INIT_CONFIG_ADAPTER. This ensures every call where both netdev_lock and the adapter->crit_lock are acquired under the same ordering. Fixes: afc6649 ("eth: iavf: extend the netdev_lock usage") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250224190647.3601930-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohammad Heib <mheib@redhat.com>
1 parent ff35a1f commit cee6c9e

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

drivers/net/ethernet/intel/iavf/iavf_main.c

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2029,7 +2029,7 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool runni
20292029
static void iavf_finish_config(struct work_struct *work)
20302030
{
20312031
struct iavf_adapter *adapter;
2032-
bool netdev_released = false;
2032+
bool locks_released = false;
20332033
int pairs, err;
20342034

20352035
adapter = container_of(work, struct iavf_adapter, finish_config);
@@ -2058,19 +2058,22 @@ static void iavf_finish_config(struct work_struct *work)
20582058
netif_set_real_num_tx_queues(adapter->netdev, pairs);
20592059

20602060
if (adapter->netdev->reg_state != NETREG_REGISTERED) {
2061+
mutex_unlock(&adapter->crit_lock);
20612062
netdev_unlock(adapter->netdev);
2062-
netdev_released = true;
2063+
locks_released = true;
20632064
err = register_netdevice(adapter->netdev);
20642065
if (err) {
20652066
dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n",
20662067
err);
20672068

20682069
/* go back and try again.*/
2070+
mutex_lock(&adapter->crit_lock);
20692071
iavf_free_rss(adapter);
20702072
iavf_free_misc_irq(adapter);
20712073
iavf_reset_interrupt_capability(adapter);
20722074
iavf_change_state(adapter,
20732075
__IAVF_INIT_CONFIG_ADAPTER);
2076+
mutex_unlock(&adapter->crit_lock);
20742077
goto out;
20752078
}
20762079
}
@@ -2086,9 +2089,10 @@ static void iavf_finish_config(struct work_struct *work)
20862089
}
20872090

20882091
out:
2089-
mutex_unlock(&adapter->crit_lock);
2090-
if (!netdev_released)
2092+
if (!locks_released) {
2093+
mutex_unlock(&adapter->crit_lock);
20912094
netdev_unlock(adapter->netdev);
2095+
}
20922096
rtnl_unlock();
20932097
}
20942098

0 commit comments

Comments
 (0)