-
Notifications
You must be signed in to change notification settings - Fork 28.7k
SPARK-1407 drain event queue before stopping event logger #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
@@ -36,6 +36,7 @@ private[spark] class LiveListenerBus extends SparkListenerBus with Logging { | |||
private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](EVENT_QUEUE_CAPACITY) | |||
private var queueFullErrorMessageLogged = false | |||
private var started = false | |||
private var sparkListenerBus: Option[Thread] = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't have to be an option, you can move the declaration of the thread up here, and do thread.start() in start()
. Also, a better name for this would be something like listenerThread
Hi @kanzhang, we have a long-running discussion about this at #251. Joining on the thread is a simple alternative to what is proposed in the other PR, but at the same time preserves clear semantics (i.e. all events before STOP will be processed to completion, and no events after STOP will be processed). Both @pwendell and I are inclined towards this approach. Could you add a test for this? #251 provided an excellent example of this. @concretevitamin could we use your test case here? We'll be sure to credit both of you. |
Jenkins, test this please. |
This does look much simpler. Also, free free to use the test there ;) |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13935/ |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
thanks for your feedback. Made changes based on reviewer comments and merged test from #251. |
Thanks - I've merged this. I think SPARK-1407 is an incorrect label btw. I also re-wrote the unit test with @andrewor14 on merge to make it a bit simpler. |
Thanks for getting rid of the 100ms wait. I think this patch addresses SPARK-1407. Without this patch, even if sc.stop() is called, it will only close the logger and flush whatever is already in its write buffer. If, at this point, some events are yet to be drained from event queue, when listenerThread tries to post them to logger later on, the same Filesystem is closed exception will be encountered. Consequently event log file will be incomplete (same as in SPARK-1407). |
Or I can open a separate JIRA to track it? |
Yes, could you open one that describes the listener bus draining part of this PR specifically? Thanks. |
I opened SPARK-1475 to track this PR. |
@pwendell @andrewor14 , I think the unit test depends on a race condition that the stopper thread is run before the listener thread does (but we don't know the actual order). Otherwise, the queue would have drained before stopper started and the test wouldn't be effective. I tried to fix in the following PR. pls take a look. thx. |
More yarn code refactor Try to retrive common code in yarn alpha/stable for client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them. Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way. So just make it run for these two files firstly.
Author: Kan Zhang <kzhang@apache.org> Closes apache#366 from kanzhang/SPARK-1407 and squashes the following commits: cd0629f [Kan Zhang] code refactoring and adding test b073ee6 [Kan Zhang] SPARK-1407 drain event queue before stopping event logger
This reverts commit a3c2539.
apache#366) Add periodic jobs to integrate terraform-provider-openstack with public clouds: telefonica, orange, opentelekomcloud and huaweicloud. Related-Bug: theopenlab/openlab#125
No description provided.