-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4714][CORE]: Check block have removed or not after got info lock in dropFromMemory #3574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, this seems like the opposite of the condition we want here: if blockInfo doesn't contain an entry for blockId, then why would we want to remove that entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ThreadA into RemoveBlock(), and got info for blockId1
- ThreadB into DropFromMemory(), and got info for blockId1
now Thread A, B all want got info.sychronized
B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock.
then A got, but info is already not in blockinfo.
Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this extra check is necessary; check out my comment on the main pull request and see if you agree.
Even if we did want to add a check here, I think we want to check for if(blockInfo.get(blockId).nonEmpty), since this branch handles the case where blocks have not been removed already.
|
Can you elaborate on the problem that this PR addresses? Have you observed an actual error that this fixes? |
|
@JoshRosen B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock. then A got, but info is already not in blockinfo. Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time? |
|
Ah, I think I see your concern: let's say that we a block and there are two threads that are racing to perform operations on it: to use your example, thread A wants to call If we consider the case where all of thread A's steps execute before any of thread B's, then things work okay: thread A will have removed the entry from In another execution, though, both threads could find the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I don't think that we strictly need a check here since the remove(id) operations are idempotent. It might be nice to log a warning if the block has already been removed, but that might not be necessary since this is a background cleanup call.
0c1a249 to
55fa4ba
Compare
|
@JoshRosen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style nit: this needs a space after the if and before the open paren: if (!info....
|
Jenkins, this is ok to test. |
|
Test build #24225 has started for PR 3574 at commit
|
|
Test build #24225 has finished for PR 3574 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment actually refers to the !info.waitForReady() case, so I'd like to either move the comment or swap the order of these checks so that we check for blockInfo.get(blockId).isEmpty in the else if clause instead.
|
Left one minor code organization comment; aside from that, this looks good to me and should be ready to merge after you fix that up (I can do it if you don't have time, though; just let me know). There are a couple of edits that I'd like to make to the commit title / description before merging this, but I can do it myself on merge. Thanks for the careful analysis and for catching this issue! |
|
@JoshRosen I refine the code according your comments. |
|
Test build #24234 has started for PR 3574 at commit
|
|
Test build #24234 has finished for PR 3574 at commit
|
|
Test PASSed. |
|
LGTM. I like how this ended up evolving into a very small, focused fix. I'm going to fix up the title / commit message and merge this into Thanks! |
… has been removed after synchronizing on BlockInfo instance.
After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed.
The three methods have the same logic to get the `info` lock:
```
info = blockInfo.get(id)
if (info != null) {
info.synchronized {
// do something
}
}
```
So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first.
The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed.
But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks.
This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread.
Author: hushan[胡珊] <hushan@xiaomi.com>
Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits:
edb989d [hushan[胡珊]] Refine code style and comments position
55fa4ba [hushan[胡珊]] refine code
e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn
(cherry picked from commit 30dca92)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
… has been removed after synchronizing on BlockInfo instance.
After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed.
The three methods have the same logic to get the `info` lock:
```
info = blockInfo.get(id)
if (info != null) {
info.synchronized {
// do something
}
}
```
So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first.
The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed.
But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks.
This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread.
Author: hushan[胡珊] <hushan@xiaomi.com>
Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits:
edb989d [hushan[胡珊]] Refine code style and comments position
55fa4ba [hushan[胡珊]] refine code
e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn
(cherry picked from commit 30dca92)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
|
I've merged this into |
… has been removed after synchronizing on BlockInfo instance.
After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed.
The three methods have the same logic to get the `info` lock:
```
info = blockInfo.get(id)
if (info != null) {
info.synchronized {
// do something
}
}
```
So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first.
The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed.
But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks.
This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread.
Author: hushan[胡珊] <hushan@xiaomi.com>
Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits:
edb989d [hushan[胡珊]] Refine code style and comments position
55fa4ba [hushan[胡珊]] refine code
e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn
(cherry picked from commit 30dca92)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
[SPARK-4714][CORE]: Check block have removed or not after got info lock in dropFromMemory
After got info lock in one of them:
removeBlock/dropOldBlocks/dropFromMemorymethod in BlockManager, the block thatinforepresented may have already removed.The Three method have the same logic to got info lock:
So it is chanced that when one thread got
info.synchronizedbut it already removed from other threads who gotinfo.synchronizedbefore him.In
removeBlockanddropOldBlocksis idempotent if they not haveblockinfo.get(blockId) != nullBut in
dropFromMemoryit may be problematic for it may drop block data(which already removed) into diskstore, this method calls data store operations that should not handle missing blocks.