[SPARK-1768] History server enhancements. #718

vanzin · 2014-05-09T21:42:14Z

Two improvements to the history server:

Separate the HTTP handling from history fetching, so that it's easy to add
new backends later (thinking about SPARK-1537 in the long run)
Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
memory for faster access. This allows the app limit to go away, since holding
just the listing in memory shouldn't be too expensive unless the user has millions
of completed apps in the history (at which point I'd expect other issues to arise
aside from history server memory usage, such as FileSystem.listStatus()
starting to become ridiculously expensive).

I also fixed a few minor things along the way which aren't really worth mentioning.
I also removed the app's log path from the UI since that information may not even
exist depending on which backend is used (even though there is only one now).

AmplabJenkins · 2014-05-09T21:42:58Z

Can one of the admins verify this patch?

vanzin · 2014-05-28T18:33:43Z

Update is just a rebase (with just minor cleanups).

vanzin · 2014-06-03T21:30:48Z

BTW, I have some changes on top of this one I've been working on, but don't want to pile on top of this same PR. If you'd like to take a look and comment, here's the first:

vanzin#1

(I don't know how to make github send an incremental pr so that only the new commits from the pr show up here, if anyone has any hints that would be appreciated.)

pwendell · 2014-06-03T23:55:40Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+  private var lastLogCheckTimeMs = -1L
+
+  // List of applications, in order from newest to oldest.
+  private val appList = new AtomicReference[Seq[ApplicationHistoryInfo]](Nil)


Is there a reason not to just use an ArrayBuffer here and add some synchronization around resetting and reading it? This AtomicReference thing is a little bit clunky and I don't see any performance argument for it.

It's not a performance thing, but a correctness thing, since the variable is accessed by different threads (one of which might be writing to it). You really don't two threads, one adding things to an ArrayBuffer while the other is trying to iterate through it, for example. That will end up in tears.

That being said, since the code is only reading and writing to the variable (and not making other kinds of non-atomic changes), volatile is probably enough. A lock is way overkill here.

pwendell · 2014-06-04T01:04:32Z

Hey @vanzin - I just did a pass on this. From what I can tell this adds four somewhat independent changes:

Caching applications that we know haven't changed.
Adding pagination to the list of applications.
Abstracting the history provider.
Replacing the way arguments are specified to the server.

Of these, I feel that (1) and (2) are great improvements. It might be nice to pull those out into a separate PR, because here they are somewhat conflated with (3) and (4).

For (3) - I think it might make more sense to wait until we have two concrete implementations here before we decided to make an interface. Otherwise I think it's very likely we'd end up changing the interface anyways down the road. And this isn't a public API - we can easily just generalize it whenever we want. @andrewor14 can comment on this more next week when he's back, since he made this original design.

For (4) the proposal here, I wasn't sure exactly what problem it was trying to solve. It would be good to have more information on that.

vanzin · 2014-06-04T17:36:33Z

Hi @pwendell,

First, thanks for taking the time to look into this.

While it's true that this change mixes a few arguably unrelated things, I'm a little worried about breaking it up further. Mainly because github's pull request feature doesn't seem to handle incremental changes well (see my previous comment, #718 (comment)). If there's a way I'm not aware of that allows me to send multiple, related PRs through github, I'll gladly make the changes. Otherwise, I'd rather keep this PR as is since I already have more work that depends on it, and breaking it down would just slow down the whole process even more.

About (3), note that one of the goals of the enhancements, as I mentioned in the Jira issue, is to prepare the field for SPARK-1537. Also, the refactoring allows for easier unit testing (which I added as part of my related change mentioned in my above comment). I'm not worried about having to change the interface later - these are internal APIs, after all. But I really would like to have the UI handling separate from the history backend.

About (4), the main thing is to allow arbitrary options to be set more easily. I'm not wedded to the idea of a new command line argument (see other comments), but I think we should at least add support for reading config files (not just in the History Server, but also other daemons). I think that more closely matches how people generally deploy daemons.

On a side note, just in case github doesn't support the "related PRs" thing I tried to explain, I'd suggest you guys take a look at something like ReviewBoard. RB makes that pretty painless, since it's not necessarily tied to github branches. It also handles things like renaming files more cleanly (when tied to a git repo). It would probably make merging changes a little more difficult, but I'm sure that could be made better with some scripts. In any case, I'm really looking for suggestions about how to handle these related PRs - sending them one at a time and having to wait until they are committed before sending the next one really slows things considerably, and prevents people from seeing the "big picture" when work spans several changes.

pwendell · 2014-06-04T18:11:34Z

Okay - we can keep the changes together. I think the test-ability argument is a good one for abstracting away the interface. We typically try not to add interfaces until we have two concrete implementations... but again, for internal code it's not a big deal since we can always refactor it later.

Re: reviewboard, we have a bunch of project automation around github, but it's possible reveiwboard could augment this somehow (not sure). Would be interesting to look into. There is no way to do an incremental diff in github, which is indeed super annoying, however in the past that feature hasn't been seen as outweighing the other benefits. I'd guess people are open to experimentation!

andrewor14 · 2014-06-09T20:51:54Z

core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala

    // Wait until the end of the world... or if the HistoryServer process is manually stopped
    while(true) { Thread.sleep(Int.MaxValue) }
-    server.stop()


Good catch. This was dead code and we never did actually stop the server.

andrewor14 · 2014-06-18T00:19:58Z

Jenkins, test this please

AmplabJenkins · 2014-06-18T00:24:49Z

Merged build triggered.

AmplabJenkins · 2014-06-18T00:24:55Z

Merged build started.

AmplabJenkins · 2014-06-18T00:59:12Z

Merged build finished.

AmplabJenkins · 2014-06-18T00:59:13Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15861/

vanzin · 2014-06-18T17:59:40Z

Same error as before - the test passes for me locally, on top of current master.

vanzin · 2014-06-19T18:15:47Z

Looks like other PRs are running into the same test issue, so definitely not an issue with my code.

pwendell · 2014-06-21T01:50:52Z

I think I fixed it. Jenkins, test this please.

AmplabJenkins · 2014-06-21T01:55:02Z

Merged build triggered.

AmplabJenkins · 2014-06-21T01:55:07Z

Merged build started.

AmplabJenkins · 2014-06-21T03:03:47Z

Merged build finished.

AmplabJenkins · 2014-06-21T03:03:47Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15978/

pwendell · 2014-06-21T03:03:55Z

Jenkins, retest this please.

AmplabJenkins · 2014-06-21T03:05:02Z

Merged build triggered.

AmplabJenkins · 2014-06-21T03:05:08Z

Merged build started.

AmplabJenkins · 2014-06-21T03:44:06Z

Merged build finished.

AmplabJenkins · 2014-06-21T03:44:06Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15980/

pwendell · 2014-06-21T23:15:45Z

Jenkins, retest this please

AmplabJenkins · 2014-06-21T23:20:05Z

Merged build triggered.

AmplabJenkins · 2014-06-21T23:20:15Z

Merged build started.

AmplabJenkins · 2014-06-21T23:59:25Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-21T23:59:26Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15999/

pwendell · 2014-06-23T20:53:25Z

Thanks @vanzin for this feature. I'm going to merge this!

Two improvements to the history server: - Separate the HTTP handling from history fetching, so that it's easy to add new backends later (thinking about SPARK-1537 in the long run) - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in memory for faster access. This allows the app limit to go away, since holding just the listing in memory shouldn't be too expensive unless the user has millions of completed apps in the history (at which point I'd expect other issues to arise aside from history server memory usage, such as FileSystem.listStatus() starting to become ridiculously expensive). I also fixed a few minor things along the way which aren't really worth mentioning. I also removed the app's log path from the UI since that information may not even exist depending on which backend is used (even though there is only one now). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#718 from vanzin/hist-server and squashes the following commits: 53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording. c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs. dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration. 4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo. 2a7f68d [Marcelo Vanzin] Address review feedback. 4e72c77 [Marcelo Vanzin] Remove comment about ordering. 249bcea [Marcelo Vanzin] Remove offset / count from provider interface. ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps. 6e2432f [Marcelo Vanzin] Second round of feedback. b2c570a [Marcelo Vanzin] Make class package-private. 4406f61 [Marcelo Vanzin] Cosmetic change to listing header. e852149 [Marcelo Vanzin] Initialize new app array to expected size. e8026f4 [Marcelo Vanzin] Review feedback. 49d2fd3 [Marcelo Vanzin] Fix a comment. 91e96ca [Marcelo Vanzin] Fix scalastyle issues. 6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info. eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down. bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI. b284478 [Marcelo Vanzin] Separate history server from history backend.

pwendell reviewed Jun 3, 2014
View reviewed changes

andrewor14 reviewed Jun 9, 2014
View reviewed changes

asfgit closed this in 21ddd7d Jun 23, 2014

vanzin deleted the hist-server branch June 23, 2014 23:42

[SPARK-1768] History server enhancements. #718

[SPARK-1768] History server enhancements. #718

Uh oh!

Conversation

vanzin commented May 9, 2014

Uh oh!

AmplabJenkins commented May 9, 2014

Uh oh!

vanzin commented May 28, 2014

Uh oh!

vanzin commented Jun 3, 2014

Uh oh!

pwendell Jun 3, 2014

Choose a reason for hiding this comment

Uh oh!

vanzin Jun 4, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell commented Jun 4, 2014

Uh oh!

vanzin commented Jun 4, 2014

Uh oh!

pwendell commented Jun 4, 2014

Uh oh!

andrewor14 Jun 9, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Jun 18, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

vanzin commented Jun 18, 2014

Uh oh!

vanzin commented Jun 19, 2014

Uh oh!

pwendell commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

pwendell commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

pwendell commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

AmplabJenkins commented Jun 21, 2014

Uh oh!

pwendell commented Jun 23, 2014

Uh oh!

Uh oh!