Skip to content

Conversation

@swamirishi
Copy link
Contributor

What changes were proposed in this pull request?

Currenlty OzoneManagerLock returns a ThreadLocal object of OmLockDetails which cannot be reused across multiplic lock lifecycles. Thus we should ensure the object is always copied used to use the empty static EmptyLock instance.

private final ThreadLocal<OMLockDetails> omLockDetails = ThreadLocal.withInitial(OMLockDetails::new);

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13978

How was this patch tested?

Added unit test
Failure before fix in MultiSnapshotLock for the same unit test added

INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: More locks cannot be acquired when locks have been already acquired. Locks acquired : []

	at org.apache.hadoop.ozone.om.snapshot.MultiSnapshotLocks.acquireLock(MultiSnapshotLocks.java:59)
	at org.apache.hadoop.ozone.om.snapshot.TestMultiSnapshotLocks.testMultiSnapshotLocksWithMultipleResourceLocksMultipleTimes(TestMultiSnapshotLocks.java:81)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)


@swamirishi swamirishi added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Nov 22, 2025
@adoroszlai
Copy link
Contributor

[ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.797 s <<< FAILURE! -- in org.apache.hadoop.ozone.om.snapshot.TestMultiSnapshotLocks
[ERROR] org.apache.hadoop.ozone.om.snapshot.TestMultiSnapshotLocks.testReleaseLock -- Time elapsed: 0.006 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.ozone.om.lock.OMLockDetails.isLockAcquired()" because "this.lockDetails" is null
	at org.apache.hadoop.ozone.om.snapshot.MultiSnapshotLocks.releaseLock(MultiSnapshotLocks.java:83)
	at org.apache.hadoop.ozone.om.snapshot.TestMultiSnapshotLocks.testReleaseLock(TestMultiSnapshotLocks.java:130)

Please wait for clean CI run in fork before opening PR.

…ThreadLocal Object

Change-Id: Id6b535194ff8770b0102c43d7a44a70132142b10
@swamirishi swamirishi force-pushed the HDDS-13978 branch 3 times, most recently from 50b6aa6 to bfbc38a Compare November 22, 2025 09:11
@adoroszlai adoroszlai marked this pull request as draft November 22, 2025 09:19
@adoroszlai adoroszlai marked this pull request as ready for review November 22, 2025 17:11
@adoroszlai
Copy link
Contributor

@swamirishi Re-running only compile (8) is failing because build artifact has already expired. I don't think it's worth re-running all checks. This original failure in compile (8) can be safely ignored:

The template is not valid. apache/ozone/.github/workflows/check.yml@1aeb6b7dac9bd01866e32f18bed38cc9d4933906 (Line: 181, Col: 16): hashFiles('**/pom.xml') failed. Fail to hash files under directory '/Users/runner/work/ozone/ozone'

So this just needs review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where OMLockDetails objects returned by OzoneManagerLock are ThreadLocal instances that cannot be safely reused across multiple lock lifecycles. The fix ensures that ThreadLocal objects are always copied to static empty instances (EMPTY_DETAILS_LOCK_ACQUIRED or EMPTY_DETAILS_LOCK_NOT_ACQUIRED) to prevent stale state issues.

Key changes:

  • Modified SnapshotCache to wrap lock/unlock functions with helpers that copy ThreadLocal OMLockDetails to static instances
  • Updated MultiSnapshotLocks to store static empty instances instead of ThreadLocal objects in both acquire and release operations
  • Added comprehensive unit tests to verify the fix works correctly across multiple lock cycles

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
SnapshotCache.java Introduced getEmptyOmLockDetails() helper and wrapper suppliers to copy ThreadLocal OMLockDetails to static instances
MultiSnapshotLocks.java Changed to store static EMPTY_DETAILS instances instead of ThreadLocal objects in acquireLock() and releaseLock()
TestSnapshotCache.java Added test verifying lock supplier works correctly after another lock is acquired and released
TestMultiSnapshotLocks.java Added test for multiple lock cycles and updated existing test to mock release return values
Comments suppressed due to low confidence (1)

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/MultiSnapshotLocks.java:79

  • The local variable assignment on line 79 is unnecessary since the value is immediately reassigned on line 83. The variable lockDetails from line 79 can be replaced with a local variable or directly used inline on line 83 as lock.releaseWriteLocks(resource, this.objectLocks).isLockAcquired() to avoid confusion.
      lockDetails = lock.releaseWriteLocks(resource, this.objectLocks);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

this.lockDetails = OMLockDetails.EMPTY_DETAILS_LOCK_NOT_ACQUIRED;
}
this.lockDetails = omLockDetails;
return omLockDetails;
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method returns the ThreadLocal omLockDetails object directly instead of returning the stored static instance. This defeats the purpose of the fix. Consider returning this.lockDetails instead, or returning a copy of omLockDetails via getEmptyOmLockDetails(omLockDetails) to maintain consistency with the SnapshotCache pattern.

Suggested change
return omLockDetails;
return this.lockDetails;

Copilot uses AI. Check for mistakes.
@swamirishi
Copy link
Contributor Author

@yandrey321 do you wanna look at the patch?

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a static

lock.acquireReadLocks(resource, keys);
if (omLockDetails.isLockAcquired()) {
objectLocks.addAll(keys);
this.lockDetails = OMLockDetails.EMPTY_DETAILS_LOCK_ACQUIRED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a static variable to replace ThreadLock object.
two locks acquired by the same thread do not interefere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot reuse the lockDetails across objects it has to be a object member

this.lockDetails = OMLockDetails.EMPTY_DETAILS_LOCK_NOT_ACQUIRED;
}
this.lockDetails = omLockDetails;
return omLockDetails;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it return this.lockDetails instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should return the same object since it would still have the lock metrics

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. MultiSnapshotLocks cannot be used across threads, and its caller SnapshotDefragService and SnapshotDeletingService use only one thread in the thread pool.

@swamirishi swamirishi merged commit c8bcf7f into apache:master Nov 24, 2025
150 of 163 checks passed
@swamirishi
Copy link
Contributor Author

Thank you @jojochuang for reviewing the patch

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Dec 5, 2025
jojochuang pushed a commit that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants