Be more careful with locking db.db_mtx #17418

asomers · 2025-06-03T21:13:58Z

Lock db->db_mtx in some places that access db->db_data. But don't lock it in free_children, even though it does access db->db_data, because that leads to a recurse-on-non-recursive panic.

Lock db->db_rwlock in some places that access db->db.db_data's contents.

Closes #16626
Sponsored by: ConnectWise

Motivation and Context

Fixes occasional in-memory corruption which is usually manifested as a panic with a message like "blkptr XXX has invalid XXX" or "blkptr XXX has no valid DVAs". I suspect that some on-disk corruption bugs have been caused by this same root cause, too.

Description

Always lock dmu_buf_impl_t.db_mtx in places that access the value of dmu_buf_impl_t.db->db_data. And always lockdmu_buf_impl_t.db_rwlock in places that access the contents of dmu_buf_impl_t.db->db_rwlock.

Note that free_children still violates these rules. It can't easily be fixed without causing other problems. A proper fix is left for the future.

How Has This Been Tested?

I cannot reproduce the bug on command, so I had to rely on statistics to validate the patch.

Since the beginning of 2025, servers running the vulnerable workload on FreeBSD 14.1 without this patch have crashed with a probability of 0.34% per server per day. The distribution of crashes fits a Poisson distribution, suggesting that each crash is random and independent. That is, a server that's already crashed once is no more likely to crash in the future than one which hasn't crashed yet.
Servers running the vulnerable workload on FreeBSD 14.2 with this patch have accumulated a total of 1301 days of uptime with no crashes. So I conclude with 98.8% confidence that the 14.2 upgrade combined with the patch is effective.
Servers running the vulnerable workload on FreeBSD 14.2 without the patch are too few to draw conclusions about. But I don't see any related changes in the diff between 14.1 and 14.2. So I think that the patch is responsible for the cessation of crashes, not the upgrade.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

alek-p

I've already reviewed this internally, and as the PR description states, we've had a good experience running with this patch for the last couple of months

amotin · 2025-06-04T18:18:37Z

As I see, in most of cases (I've spotted only one different) when you are taking db_rwlock, you also take db_mtx. It makes no sense to me, unless the only few exceptions are enormously expensive or otherwise don't allow db_mtx to be taken. I feel like we need some better understanding of locking strategy. At least I do.

snajpa · 2025-06-04T18:39:09Z

FWIW, as we're discussing here, I even think - after all the staring at the code - that the locking itself is actually fine, it seems to be a result of optimizations exactly because things don't need to be overlocked if it's guaranteed to be OK via other logical dependencies.

I think I have actually nailed where the problem is, but @asomers says he can't try it :)

asomers · 2025-06-04T20:03:42Z

As I see, in most of cases (I've spotted only one different) when you are taking db_rwlock, you also take db_mtx. It makes no sense to me, unless the only few exceptions are enormously expensive or otherwise don't allow db_mtx to be taken. I feel like we need some better understanding of locking strategy. At least I do.

That's because of this comment from @pcd1193182: "So the subtlety here is that the value of the db.db_data and db_buf fields are, I believe, still protected by the db_mtx plus the db_holds refcount. The contents of the buffers are protected by the db_rwlock." So many places need both db_mtx and db_rwlock. Some need only the former. I don't know of any cases where code would only need the latter.

snajpa · 2025-06-04T20:35:17Z

I'm sorry, I mixed it up. This is definitely needed and then there's a bug with dbuf resize. Two different things.

Lock db_mtx in some places that access db->db_data. But don't lock it in free_children, even though it does access db->db_data, because that leads to a recurse-on-non-recursive panic. Lock db_rwlock in some places that access db->db.db_data's contents. Closes openzfs#16626 Sponsored by: ConnectWise Signed-off-by: Alan Somers <asomers@gmail.com>

asomers mentioned this pull request Jun 3, 2025

2.3.2 causing kernel panic and I/O hangs, 2.3.1 works on same dataset #17307

Open

asomers force-pushed the db_data branch from 05077e2 to e8c8b5a Compare June 3, 2025 21:23

behlendorf self-requested a review June 4, 2025 00:12

alek-p approved these changes Jun 4, 2025

View reviewed changes

snajpa approved these changes Jun 13, 2025

View reviewed changes

asomers force-pushed the db_data branch from e8c8b5a to a359e6c Compare June 25, 2025 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Be more careful with locking db.db_mtx #17418

Be more careful with locking db.db_mtx #17418

Uh oh!

asomers commented Jun 3, 2025

Uh oh!

alek-p left a comment

Uh oh!

amotin commented Jun 4, 2025

Uh oh!

snajpa commented Jun 4, 2025

Uh oh!

asomers commented Jun 4, 2025

Uh oh!

snajpa commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Be more careful with locking db.db_mtx #17418

Are you sure you want to change the base?

Be more careful with locking db.db_mtx #17418

Uh oh!

Conversation

asomers commented Jun 3, 2025

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

alek-p left a comment

Choose a reason for hiding this comment

Uh oh!

amotin commented Jun 4, 2025

Uh oh!

snajpa commented Jun 4, 2025

Uh oh!

asomers commented Jun 4, 2025

Uh oh!

snajpa commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

snajpa commented Jun 4, 2025 •

edited

Loading