-
Notifications
You must be signed in to change notification settings - Fork 581
HDDS-13978. OMLockDetails should not be used as the object returns a ThreadLocal Object #9346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please wait for clean CI run in fork before opening PR. |
…ThreadLocal Object Change-Id: Id6b535194ff8770b0102c43d7a44a70132142b10
50b6aa6 to
bfbc38a
Compare
|
@swamirishi Re-running only So this just needs review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug where OMLockDetails objects returned by OzoneManagerLock are ThreadLocal instances that cannot be safely reused across multiple lock lifecycles. The fix ensures that ThreadLocal objects are always copied to static empty instances (EMPTY_DETAILS_LOCK_ACQUIRED or EMPTY_DETAILS_LOCK_NOT_ACQUIRED) to prevent stale state issues.
Key changes:
- Modified
SnapshotCacheto wrap lock/unlock functions with helpers that copy ThreadLocalOMLockDetailsto static instances - Updated
MultiSnapshotLocksto store static empty instances instead of ThreadLocal objects in both acquire and release operations - Added comprehensive unit tests to verify the fix works correctly across multiple lock cycles
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| SnapshotCache.java | Introduced getEmptyOmLockDetails() helper and wrapper suppliers to copy ThreadLocal OMLockDetails to static instances |
| MultiSnapshotLocks.java | Changed to store static EMPTY_DETAILS instances instead of ThreadLocal objects in acquireLock() and releaseLock() |
| TestSnapshotCache.java | Added test verifying lock supplier works correctly after another lock is acquired and released |
| TestMultiSnapshotLocks.java | Added test for multiple lock cycles and updated existing test to mock release return values |
Comments suppressed due to low confidence (1)
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/MultiSnapshotLocks.java:79
- The local variable assignment on line 79 is unnecessary since the value is immediately reassigned on line 83. The variable
lockDetailsfrom line 79 can be replaced with a local variable or directly used inline on line 83 aslock.releaseWriteLocks(resource, this.objectLocks).isLockAcquired()to avoid confusion.
lockDetails = lock.releaseWriteLocks(resource, this.objectLocks);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| this.lockDetails = OMLockDetails.EMPTY_DETAILS_LOCK_NOT_ACQUIRED; | ||
| } | ||
| this.lockDetails = omLockDetails; | ||
| return omLockDetails; |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method returns the ThreadLocal omLockDetails object directly instead of returning the stored static instance. This defeats the purpose of the fix. Consider returning this.lockDetails instead, or returning a copy of omLockDetails via getEmptyOmLockDetails(omLockDetails) to maintain consistency with the SnapshotCache pattern.
| return omLockDetails; | |
| return this.lockDetails; |
|
@yandrey321 do you wanna look at the patch? |
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a static
| lock.acquireReadLocks(resource, keys); | ||
| if (omLockDetails.isLockAcquired()) { | ||
| objectLocks.addAll(keys); | ||
| this.lockDetails = OMLockDetails.EMPTY_DETAILS_LOCK_ACQUIRED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a static variable to replace ThreadLock object.
two locks acquired by the same thread do not interefere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot reuse the lockDetails across objects it has to be a object member
| this.lockDetails = OMLockDetails.EMPTY_DETAILS_LOCK_NOT_ACQUIRED; | ||
| } | ||
| this.lockDetails = omLockDetails; | ||
| return omLockDetails; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it return this.lockDetails instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should return the same object since it would still have the lock metrics
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. MultiSnapshotLocks cannot be used across threads, and its caller SnapshotDefragService and SnapshotDeletingService use only one thread in the thread pool.
|
Thank you @jojochuang for reviewing the patch |
…ThreadLocal Object (apache#9346) (cherry picked from commit c8bcf7f)
What changes were proposed in this pull request?
Currenlty OzoneManagerLock returns a ThreadLocal object of OmLockDetails which cannot be reused across multiplic lock lifecycles. Thus we should ensure the object is always copied used to use the empty static EmptyLock instance.
ozone/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/lock/OzoneManagerLock.java
Line 583 in 5e715aa
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13978
How was this patch tested?
Added unit test
Failure before fix in MultiSnapshotLock for the same unit test added