-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-15382. Split one FsDatasetImpl lock to block pool grain locks. #3941
Conversation
💔 -1 overall
This message was automatically generated. |
c37adaf
to
1cfe96b
Compare
💔 -1 overall
This message was automatically generated. |
@sodonnel sorry to interrupt, this patch seems very promising. Any comment about this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MingXiangLi for your works. Have a quick glance and leave some nit comments inline, I think most of the changes look good to me. Consider it is big changes here we should involve more guys to review and evaluate this improvement.
BTW, the check annotations warning seem not related to this changes. I will trigger the Jenkins again, and let's wait what jenkins says.
@@ -504,15 +503,13 @@ private void createWorkPlan(NodePlan plan) throws DiskBalancerException { | |||
Map<String, String> storageIDToVolBasePathMap = new HashMap<>(); | |||
FsDatasetSpi.FsVolumeReferences references; | |||
try { | |||
try(AutoCloseableLock lock = this.dataset.acquireDatasetReadLock()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggest to keep it and improve to #dataSetLock as other improvement. We should create another jira to remove it if this is not necessary now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method just get volumes snapshot so no need to acquire dataset lock.
@@ -464,42 +427,40 @@ public AutoCloseableLock acquireDatasetReadLock() { | |||
* Activate a volume to serve requests. | |||
* @throws IOException if the storage UUID already exists. | |||
*/ | |||
private void activateVolume( | |||
private synchronized void activateVolume( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggest to add annotation to describe that we use #synchronized to protect volume instance and use datasetLock to protect blockpool instance.
@@ -3654,18 +3661,21 @@ public int getVolumeCount() { | |||
} | |||
|
|||
void stopAllDataxceiverThreads(FsVolumeImpl volume) { | |||
try (AutoCloseableLock lock = datasetWriteLock.acquire()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comment#1.
@@ -232,118 +232,6 @@ public void setUp() throws IOException { | |||
assertEquals(0, dataset.getNumFailedVolumes()); | |||
} | |||
|
|||
@Test(timeout=10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get why delete this unit test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this old lock model no longer useful
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this. some quick questions:
@@ -3230,7 +3235,9 @@ void transferReplicaForPipelineRecovery(final ExtendedBlock b, | |||
final BlockConstructionStage stage; | |||
|
|||
//get replica information | |||
try(AutoCloseableLock lock = data.acquireDatasetReadLock()) { | |||
|
|||
try(AutoCloseableLock lock = dataSetLockManager.writeLock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a read lock before. Any idea why it is made a write lock now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be read lock.I write wrong here.
} | ||
references.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to instantiate the references object in a try .. with block to ensure it is closed properly even with an exception,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,I will fix this later
@@ -602,7 +563,7 @@ public void removeVolumes( | |||
new ArrayList<>(storageLocsToRemove); | |||
Map<String, List<ReplicaInfo>> blkToInvalidate = new HashMap<>(); | |||
List<String> storageToRemove = new ArrayList<>(); | |||
try (AutoCloseableLock lock = datasetWriteLock.acquire()) { | |||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about making the entire method synchronized instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This object lock is protect data structure related volumes like add and remove.So just replace dataset global write lock.HDFS-9445 fix problem if use synchronized to whole method.
@@ -117,7 +121,7 @@ ReplicaInfo get(String bpid, long blockId) { | |||
ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) { | |||
checkBlockPool(bpid); | |||
checkBlock(replicaInfo); | |||
try (AutoCloseableLock l = writeLock.acquire()) { | |||
try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, bpid)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is read lock used here?
LightWeightResizableGSet is not thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in new version I have change LightWeightResizableGSet to make it thread-safe.
@@ -1850,7 +1828,8 @@ public ReplicaHandler createTemporary(StorageType storageType, | |||
ReplicaInfo lastFoundReplicaInfo = null; | |||
boolean isInPipeline = false; | |||
do { | |||
try (AutoCloseableLock lock = datasetWriteLock.acquire()) { | |||
try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here write lock is changed to read lock, is there any problem?
@@ -184,7 +203,7 @@ void mergeAll(ReplicaMap other) { | |||
ReplicaInfo remove(String bpid, Block block) { | |||
checkBlockPool(bpid); | |||
checkBlock(block); | |||
try (AutoCloseableLock l = writeLock.acquire()) { | |||
try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, bpid)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Hexiaoqiao . We should keep the read/write lock. Once block Pool lock
is introduced, we can discuss whether write locks
need to be changed to read locks
in other JIRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the LightWeightResizableGSet is thread-safe, so most operate in ReplicaMap just need block pool read lock.
@jojochuang @tomscut The previous PR has merge refer to #3900. At this PR it improve LightWeightResizableGSet to thread-safe and create LockManager mode. Please reference. |
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave the license boiler plate text unchanged.
Thanks @Hexiaoqiao for your reply. I'll look at this PR later. |
1cfe96b
to
cda5cbc
Compare
renew the pr improvement TODO |
🎊 +1 overall
This message was automatically generated. |
cda5cbc
to
3586f00
Compare
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MingXiangLi , The latest version makes sense to me. +1 from my side.
cc @jojochuang @tomscut Would you mind to give another reviews. Thanks.
Thanks @MingXiangLi and @Hexiaoqiao . This is a great idea to introduce fine-grained locking for datanode. In fact, we introduced original PR of HDFS-15380 and The latest version looks good to me. |
Thanks @tomscut for your reviews. Will commit to trunk wait for two days if no more other comments here. Thanks. |
…3941). Contributed by limingxiang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: litao <tomleescut@gmail.com>
Committed to trunk. Thanks @MingXiangLi for your contributions! Thanks @tomscut and @jojochuang for your reviews! |
…pache#3941). Contributed by limingxiang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: litao <tomleescut@gmail.com>
Design doc
https://drive.google.com/file/d/1eaE8vSEhIli0H3j2eDiPJNYuKAC0MFgu/view?usp=sharing
https://issues.apache.org/jira/browse/HDFS-15382