-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4873][Streaming] Use Future.zip instead of Future.flatMap(for-loop) in WriteAheadLogBasedBlockHandler
#3721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #24542 has started for PR 3721 at commit
|
|
Test build #24543 has started for PR 3721 at commit
|
|
Test build #24542 has finished for PR 3721 at commit
|
|
Test PASSed. |
|
Test build #24543 has finished for PR 3721 at commit
|
|
Test PASSed. |
|
The flaky test fix here looks like it addresses https://issues.apache.org/jira/browse/SPARK-4790. /cc @harishreedharan, who is investigating that flaky test. |
|
This would not address the test issue as we are still returning from cleanupOldLogs before the deletion is completed. This |
|
The eventually ensures that the block is retried - but the failure is happening before the eventually, so the test would still throw. |
|
Hmm, looking at it again - this would fix the test as well, though I think the approach in #3726 is cleaner. |
|
My first trial was making |
|
Used |
|
Test build #24566 has started for PR 3721 at commit
|
|
Test build #24566 has finished for PR 3721 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be exposing this future. That is internal implementation detail and we'd have to stick with this implementation. At some point, we might want to delete the files synchronously - at which point returning this Future might not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I prefer to also return Future for an asynchronous action. Returning Unit hides the asynchronous feature and such method will be misused easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that deleting asynchronously is an implementation detail - which we choose to do now, which can be changed later, don't you think? If we change it later, we'd have to change this method's signature - which can cause pain if there is code that uses Await.result(*) on this future.
If we expose it via a parameter, we can choose to ignore the param and still the calling code will not have to change. Since it is unlikely that the calling code will actually depend on the async nature, it is unlikely to see any difference in functionality and no change in code is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that deleting asynchronously is an implementation detail - which we choose to do now, which can be changed later, don't you think? If we change it later, we'd have to change this method's signature - which can cause pain if there is code that uses Await.result(*) on this future.
I doubt if it will be changed. However, asynchronously is an important implementation detail that the caller should know it, or they may misuse it.
If we expose it via a parameter, we can choose to ignore the param and still the calling code will not have to change. Since it is unlikely that the calling code will actually depend on the async nature, it is unlikely to see any difference in functionality and no change in code is required.
I don't think a parameter is enough. At least, it needs a more parameter, a timeout parameter. In your PR, you used 1 second which may not be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's more, if we really want to change it to a synchronously deleting. Returning Future does still work. Just simply writing something like:
deleteFiles()
return Promise[Unit]().success(null).futureThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good catch. I wasnt aware that for-yield is flatMap which is obviously ordered.
|
High level comment, I think I completely agree on the Future.zip, but not sure if I find the other changed related to other two valid.
So it would be great if you can reduce the scope of this PR to the |
Future.zip instead of Future.flatMap(for-loop) in WriteAheadLogBasedBlockHandler
|
Now this PR only contains |
|
Test build #24799 has started for PR 3721 at commit
|
|
LGTM. Will merge if tests pass. |
|
Test build #24799 has finished for PR 3721 at commit
|
|
Test PASSed. |
|
Merged this, thanks very much! |
…for-loop) in WriteAheadLogBasedBlockHandler Use `Future.zip` instead of `Future.flatMap`(for-loop). `zip` implies these two Futures will run concurrently, while `flatMap` usually means one Future depends on the other one. Author: zsxwing <zsxwing@gmail.com> Closes #3721 from zsxwing/SPARK-4873 and squashes the following commits: 46a2cd9 [zsxwing] Use Future.zip instead of Future.flatMap(for-loop) (cherry picked from commit b4d0db8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Use
Future.zipinstead ofFuture.flatMap(for-loop).zipimplies these two Futures will run concurrently, whileflatMapusually means one Future depends on the other one.