Skip to content

[SPARK-1768] History server enhancements. #718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented May 9, 2014

Two improvements to the history server:

  • Separate the HTTP handling from history fetching, so that it's easy to add
    new backends later (thinking about SPARK-1537 in the long run)
  • Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
    memory for faster access. This allows the app limit to go away, since holding
    just the listing in memory shouldn't be too expensive unless the user has millions
    of completed apps in the history (at which point I'd expect other issues to arise
    aside from history server memory usage, such as FileSystem.listStatus()
    starting to become ridiculously expensive).

I also fixed a few minor things along the way which aren't really worth mentioning.
I also removed the app's log path from the UI since that information may not even
exist depending on which backend is used (even though there is only one now).

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@vanzin
Copy link
Contributor Author

vanzin commented May 28, 2014

Update is just a rebase (with just minor cleanups).

@vanzin
Copy link
Contributor Author

vanzin commented Jun 3, 2014

BTW, I have some changes on top of this one I've been working on, but don't want to pile on top of this same PR. If you'd like to take a look and comment, here's the first:

vanzin#1

(I don't know how to make github send an incremental pr so that only the new commits from the pr show up here, if anyone has any hints that would be appreciated.)

private var lastLogCheckTimeMs = -1L

// List of applications, in order from newest to oldest.
private val appList = new AtomicReference[Seq[ApplicationHistoryInfo]](Nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to just use an ArrayBuffer here and add some synchronization around resetting and reading it? This AtomicReference thing is a little bit clunky and I don't see any performance argument for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a performance thing, but a correctness thing, since the variable is accessed by different threads (one of which might be writing to it). You really don't two threads, one adding things to an ArrayBuffer while the other is trying to iterate through it, for example. That will end up in tears.

That being said, since the code is only reading and writing to the variable (and not making other kinds of non-atomic changes), volatile is probably enough. A lock is way overkill here.

@pwendell
Copy link
Contributor

pwendell commented Jun 4, 2014

Hey @vanzin - I just did a pass on this. From what I can tell this adds four somewhat independent changes:

  1. Caching applications that we know haven't changed.
  2. Adding pagination to the list of applications.
  3. Abstracting the history provider.
  4. Replacing the way arguments are specified to the server.

Of these, I feel that (1) and (2) are great improvements. It might be nice to pull those out into a separate PR, because here they are somewhat conflated with (3) and (4).

For (3) - I think it might make more sense to wait until we have two concrete implementations here before we decided to make an interface. Otherwise I think it's very likely we'd end up changing the interface anyways down the road. And this isn't a public API - we can easily just generalize it whenever we want. @andrewor14 can comment on this more next week when he's back, since he made this original design.

For (4) the proposal here, I wasn't sure exactly what problem it was trying to solve. It would be good to have more information on that.

@vanzin
Copy link
Contributor Author

vanzin commented Jun 4, 2014

Hi @pwendell,

First, thanks for taking the time to look into this.

While it's true that this change mixes a few arguably unrelated things, I'm a little worried about breaking it up further. Mainly because github's pull request feature doesn't seem to handle incremental changes well (see my previous comment, #718 (comment)). If there's a way I'm not aware of that allows me to send multiple, related PRs through github, I'll gladly make the changes. Otherwise, I'd rather keep this PR as is since I already have more work that depends on it, and breaking it down would just slow down the whole process even more.

About (3), note that one of the goals of the enhancements, as I mentioned in the Jira issue, is to prepare the field for SPARK-1537. Also, the refactoring allows for easier unit testing (which I added as part of my related change mentioned in my above comment). I'm not worried about having to change the interface later - these are internal APIs, after all. But I really would like to have the UI handling separate from the history backend.

About (4), the main thing is to allow arbitrary options to be set more easily. I'm not wedded to the idea of a new command line argument (see other comments), but I think we should at least add support for reading config files (not just in the History Server, but also other daemons). I think that more closely matches how people generally deploy daemons.

On a side note, just in case github doesn't support the "related PRs" thing I tried to explain, I'd suggest you guys take a look at something like ReviewBoard. RB makes that pretty painless, since it's not necessarily tied to github branches. It also handles things like renaming files more cleanly (when tied to a git repo). It would probably make merging changes a little more difficult, but I'm sure that could be made better with some scripts. In any case, I'm really looking for suggestions about how to handle these related PRs - sending them one at a time and having to wait until they are committed before sending the next one really slows things considerably, and prevents people from seeing the "big picture" when work spans several changes.

@pwendell
Copy link
Contributor

pwendell commented Jun 4, 2014

Okay - we can keep the changes together. I think the test-ability argument is a good one for abstracting away the interface. We typically try not to add interfaces until we have two concrete implementations... but again, for internal code it's not a big deal since we can always refactor it later.

Re: reviewboard, we have a bunch of project automation around github, but it's possible reveiwboard could augment this somehow (not sure). Would be interesting to look into. There is no way to do an incremental diff in github, which is indeed super annoying, however in the past that feature hasn't been seen as outweighing the other benefits. I'd guess people are open to experimentation!

// Wait until the end of the world... or if the HistoryServer process is manually stopped
while(true) { Thread.sleep(Int.MaxValue) }
server.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This was dead code and we never did actually stop the server.

@andrewor14
Copy link
Contributor

Jenkins, test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15861/

@vanzin
Copy link
Contributor Author

vanzin commented Jun 18, 2014

Same error as before - the test passes for me locally, on top of current master.

@vanzin
Copy link
Contributor Author

vanzin commented Jun 19, 2014

Looks like other PRs are running into the same test issue, so definitely not an issue with my code.

@pwendell
Copy link
Contributor

I think I fixed it. Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15978/

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15980/

@pwendell
Copy link
Contributor

Jenkins, retest this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15999/

@pwendell
Copy link
Contributor

Thanks @vanzin for this feature. I'm going to merge this!

@asfgit asfgit closed this in 21ddd7d Jun 23, 2014
@vanzin vanzin deleted the hist-server branch June 23, 2014 23:42
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Two improvements to the history server:

- Separate the HTTP handling from history fetching, so that it's easy to add
  new backends later (thinking about SPARK-1537 in the long run)

- Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
  memory for faster access. This allows the app limit to go away, since holding
  just the listing in memory shouldn't be too expensive unless the user has millions
  of completed apps in the history (at which point I'd expect other issues to arise
  aside from history server memory usage, such as FileSystem.listStatus()
  starting to become ridiculously expensive).

I also fixed a few minor things along the way which aren't really worth mentioning.
I also removed the app's log path from the UI since that information may not even
exist depending on which backend is used (even though there is only one now).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#718 from vanzin/hist-server and squashes the following commits:

53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording.
c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs.
dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration.
4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo.
2a7f68d [Marcelo Vanzin] Address review feedback.
4e72c77 [Marcelo Vanzin] Remove comment about ordering.
249bcea [Marcelo Vanzin] Remove offset / count from provider interface.
ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps.
6e2432f [Marcelo Vanzin] Second round of feedback.
b2c570a [Marcelo Vanzin] Make class package-private.
4406f61 [Marcelo Vanzin] Cosmetic change to listing header.
e852149 [Marcelo Vanzin] Initialize new app array to expected size.
e8026f4 [Marcelo Vanzin] Review feedback.
49d2fd3 [Marcelo Vanzin] Fix a comment.
91e96ca [Marcelo Vanzin] Fix scalastyle issues.
6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info.
eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down.
bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI.
b284478 [Marcelo Vanzin] Separate history server from history backend.
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Two improvements to the history server:

- Separate the HTTP handling from history fetching, so that it's easy to add
  new backends later (thinking about SPARK-1537 in the long run)

- Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
  memory for faster access. This allows the app limit to go away, since holding
  just the listing in memory shouldn't be too expensive unless the user has millions
  of completed apps in the history (at which point I'd expect other issues to arise
  aside from history server memory usage, such as FileSystem.listStatus()
  starting to become ridiculously expensive).

I also fixed a few minor things along the way which aren't really worth mentioning.
I also removed the app's log path from the UI since that information may not even
exist depending on which backend is used (even though there is only one now).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#718 from vanzin/hist-server and squashes the following commits:

53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording.
c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs.
dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration.
4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo.
2a7f68d [Marcelo Vanzin] Address review feedback.
4e72c77 [Marcelo Vanzin] Remove comment about ordering.
249bcea [Marcelo Vanzin] Remove offset / count from provider interface.
ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps.
6e2432f [Marcelo Vanzin] Second round of feedback.
b2c570a [Marcelo Vanzin] Make class package-private.
4406f61 [Marcelo Vanzin] Cosmetic change to listing header.
e852149 [Marcelo Vanzin] Initialize new app array to expected size.
e8026f4 [Marcelo Vanzin] Review feedback.
49d2fd3 [Marcelo Vanzin] Fix a comment.
91e96ca [Marcelo Vanzin] Fix scalastyle issues.
6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info.
eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down.
bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI.
b284478 [Marcelo Vanzin] Separate history server from history backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants