Skip to content

SPARK-1407 drain event queue before stopping event logger #366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

kanzhang
Copy link
Contributor

@kanzhang kanzhang commented Apr 9, 2014

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -36,6 +36,7 @@ private[spark] class LiveListenerBus extends SparkListenerBus with Logging {
private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](EVENT_QUEUE_CAPACITY)
private var queueFullErrorMessageLogged = false
private var started = false
private var sparkListenerBus: Option[Thread] = _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have to be an option, you can move the declaration of the thread up here, and do thread.start() in start(). Also, a better name for this would be something like listenerThread

@andrewor14
Copy link
Contributor

Hi @kanzhang, we have a long-running discussion about this at #251. Joining on the thread is a simple alternative to what is proposed in the other PR, but at the same time preserves clear semantics (i.e. all events before STOP will be processed to completion, and no events after STOP will be processed). Both @pwendell and I are inclined towards this approach.

Could you add a test for this? #251 provided an excellent example of this. @concretevitamin could we use your test case here? We'll be sure to credit both of you.

@pwendell
Copy link
Contributor

pwendell commented Apr 9, 2014

Jenkins, test this please.

@concretevitamin
Copy link
Contributor

This does look much simpler. Also, free free to use the test there ;)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13935/

@pwendell
Copy link
Contributor

pwendell commented Apr 9, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13940/

@kanzhang
Copy link
Contributor Author

kanzhang commented Apr 9, 2014

thanks for your feedback. Made changes based on reviewer comments and merged test from #251.

@pwendell
Copy link
Contributor

pwendell commented Apr 9, 2014

Thanks - I've merged this. I think SPARK-1407 is an incorrect label btw. I also re-wrote the unit test with @andrewor14 on merge to make it a bit simpler.

@asfgit asfgit closed this in eb5f2b6 Apr 9, 2014
@kanzhang
Copy link
Contributor Author

Thanks for getting rid of the 100ms wait.

I think this patch addresses SPARK-1407. Without this patch, even if sc.stop() is called, it will only close the logger and flush whatever is already in its write buffer. If, at this point, some events are yet to be drained from event queue, when listenerThread tries to post them to logger later on, the same Filesystem is closed exception will be encountered. Consequently event log file will be incomplete (same as in SPARK-1407).

@kanzhang
Copy link
Contributor Author

Or I can open a separate JIRA to track it?

@andrewor14
Copy link
Contributor

Yes, could you open one that describes the listener bus draining part of this PR specifically? Thanks.

@kanzhang
Copy link
Contributor Author

I opened SPARK-1475 to track this PR.

@kanzhang
Copy link
Contributor Author

@pwendell @andrewor14 , I think the unit test depends on a race condition that the stopper thread is run before the listener thread does (but we don't know the actual order). Otherwise, the queue would have drained before stopper started and the test wouldn't be effective. I tried to fix in the following PR. pls take a look. thx.

#401

@kanzhang kanzhang deleted the SPARK-1407 branch April 17, 2014 02:26
pwendell pushed a commit to pwendell/spark that referenced this pull request May 12, 2014
More yarn code refactor

Try to retrive common code in yarn alpha/stable for  client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them.

Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way.

So just make it run for these two files firstly.
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Author: Kan Zhang <kzhang@apache.org>

Closes apache#366 from kanzhang/SPARK-1407 and squashes the following commits:

cd0629f [Kan Zhang] code refactoring and adding test
b073ee6 [Kan Zhang] SPARK-1407 drain event queue before stopping event logger
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
apache#366)

Add periodic jobs to integrate terraform-provider-openstack with
public clouds: telefonica, orange, opentelekomcloud and huaweicloud.

Related-Bug: theopenlab/openlab#125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants