-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-1768] History server enhancements. #718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
Update is just a rebase (with just minor cleanups). |
BTW, I have some changes on top of this one I've been working on, but don't want to pile on top of this same PR. If you'd like to take a look and comment, here's the first: (I don't know how to make github send an incremental pr so that only the new commits from the pr show up here, if anyone has any hints that would be appreciated.) |
private var lastLogCheckTimeMs = -1L | ||
|
||
// List of applications, in order from newest to oldest. | ||
private val appList = new AtomicReference[Seq[ApplicationHistoryInfo]](Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to just use an ArrayBuffer
here and add some synchronization around resetting and reading it? This AtomicReference
thing is a little bit clunky and I don't see any performance argument for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a performance thing, but a correctness thing, since the variable is accessed by different threads (one of which might be writing to it). You really don't two threads, one adding things to an ArrayBuffer while the other is trying to iterate through it, for example. That will end up in tears.
That being said, since the code is only reading and writing to the variable (and not making other kinds of non-atomic changes), volatile is probably enough. A lock is way overkill here.
Hey @vanzin - I just did a pass on this. From what I can tell this adds four somewhat independent changes:
Of these, I feel that (1) and (2) are great improvements. It might be nice to pull those out into a separate PR, because here they are somewhat conflated with (3) and (4). For (3) - I think it might make more sense to wait until we have two concrete implementations here before we decided to make an interface. Otherwise I think it's very likely we'd end up changing the interface anyways down the road. And this isn't a public API - we can easily just generalize it whenever we want. @andrewor14 can comment on this more next week when he's back, since he made this original design. For (4) the proposal here, I wasn't sure exactly what problem it was trying to solve. It would be good to have more information on that. |
Hi @pwendell, First, thanks for taking the time to look into this. While it's true that this change mixes a few arguably unrelated things, I'm a little worried about breaking it up further. Mainly because github's pull request feature doesn't seem to handle incremental changes well (see my previous comment, #718 (comment)). If there's a way I'm not aware of that allows me to send multiple, related PRs through github, I'll gladly make the changes. Otherwise, I'd rather keep this PR as is since I already have more work that depends on it, and breaking it down would just slow down the whole process even more. About (3), note that one of the goals of the enhancements, as I mentioned in the Jira issue, is to prepare the field for SPARK-1537. Also, the refactoring allows for easier unit testing (which I added as part of my related change mentioned in my above comment). I'm not worried about having to change the interface later - these are internal APIs, after all. But I really would like to have the UI handling separate from the history backend. About (4), the main thing is to allow arbitrary options to be set more easily. I'm not wedded to the idea of a new command line argument (see other comments), but I think we should at least add support for reading config files (not just in the History Server, but also other daemons). I think that more closely matches how people generally deploy daemons. On a side note, just in case github doesn't support the "related PRs" thing I tried to explain, I'd suggest you guys take a look at something like ReviewBoard. RB makes that pretty painless, since it's not necessarily tied to github branches. It also handles things like renaming files more cleanly (when tied to a git repo). It would probably make merging changes a little more difficult, but I'm sure that could be made better with some scripts. In any case, I'm really looking for suggestions about how to handle these related PRs - sending them one at a time and having to wait until they are committed before sending the next one really slows things considerably, and prevents people from seeing the "big picture" when work spans several changes. |
Okay - we can keep the changes together. I think the test-ability argument is a good one for abstracting away the interface. We typically try not to add interfaces until we have two concrete implementations... but again, for internal code it's not a big deal since we can always refactor it later. Re: reviewboard, we have a bunch of project automation around github, but it's possible reveiwboard could augment this somehow (not sure). Would be interesting to look into. There is no way to do an incremental diff in github, which is indeed super annoying, however in the past that feature hasn't been seen as outweighing the other benefits. I'd guess people are open to experimentation! |
// Wait until the end of the world... or if the HistoryServer process is manually stopped | ||
while(true) { Thread.sleep(Int.MaxValue) } | ||
server.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This was dead code and we never did actually stop the server.
Jenkins, test this please |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15861/ |
Same error as before - the test passes for me locally, on top of current master. |
Looks like other PRs are running into the same test issue, so definitely not an issue with my code. |
I think I fixed it. Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15978/ |
Jenkins, retest this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15980/ |
Jenkins, retest this please |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Thanks @vanzin for this feature. I'm going to merge this! |
Two improvements to the history server: - Separate the HTTP handling from history fetching, so that it's easy to add new backends later (thinking about SPARK-1537 in the long run) - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in memory for faster access. This allows the app limit to go away, since holding just the listing in memory shouldn't be too expensive unless the user has millions of completed apps in the history (at which point I'd expect other issues to arise aside from history server memory usage, such as FileSystem.listStatus() starting to become ridiculously expensive). I also fixed a few minor things along the way which aren't really worth mentioning. I also removed the app's log path from the UI since that information may not even exist depending on which backend is used (even though there is only one now). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#718 from vanzin/hist-server and squashes the following commits: 53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording. c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs. dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration. 4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo. 2a7f68d [Marcelo Vanzin] Address review feedback. 4e72c77 [Marcelo Vanzin] Remove comment about ordering. 249bcea [Marcelo Vanzin] Remove offset / count from provider interface. ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps. 6e2432f [Marcelo Vanzin] Second round of feedback. b2c570a [Marcelo Vanzin] Make class package-private. 4406f61 [Marcelo Vanzin] Cosmetic change to listing header. e852149 [Marcelo Vanzin] Initialize new app array to expected size. e8026f4 [Marcelo Vanzin] Review feedback. 49d2fd3 [Marcelo Vanzin] Fix a comment. 91e96ca [Marcelo Vanzin] Fix scalastyle issues. 6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info. eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down. bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI. b284478 [Marcelo Vanzin] Separate history server from history backend.
Two improvements to the history server: - Separate the HTTP handling from history fetching, so that it's easy to add new backends later (thinking about SPARK-1537 in the long run) - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in memory for faster access. This allows the app limit to go away, since holding just the listing in memory shouldn't be too expensive unless the user has millions of completed apps in the history (at which point I'd expect other issues to arise aside from history server memory usage, such as FileSystem.listStatus() starting to become ridiculously expensive). I also fixed a few minor things along the way which aren't really worth mentioning. I also removed the app's log path from the UI since that information may not even exist depending on which backend is used (even though there is only one now). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#718 from vanzin/hist-server and squashes the following commits: 53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording. c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs. dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration. 4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo. 2a7f68d [Marcelo Vanzin] Address review feedback. 4e72c77 [Marcelo Vanzin] Remove comment about ordering. 249bcea [Marcelo Vanzin] Remove offset / count from provider interface. ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps. 6e2432f [Marcelo Vanzin] Second round of feedback. b2c570a [Marcelo Vanzin] Make class package-private. 4406f61 [Marcelo Vanzin] Cosmetic change to listing header. e852149 [Marcelo Vanzin] Initialize new app array to expected size. e8026f4 [Marcelo Vanzin] Review feedback. 49d2fd3 [Marcelo Vanzin] Fix a comment. 91e96ca [Marcelo Vanzin] Fix scalastyle issues. 6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info. eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down. bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI. b284478 [Marcelo Vanzin] Separate history server from history backend.
Two improvements to the history server:
new backends later (thinking about SPARK-1537 in the long run)
memory for faster access. This allows the app limit to go away, since holding
just the listing in memory shouldn't be too expensive unless the user has millions
of completed apps in the history (at which point I'd expect other issues to arise
aside from history server memory usage, such as FileSystem.listStatus()
starting to become ridiculously expensive).
I also fixed a few minor things along the way which aren't really worth mentioning.
I also removed the app's log path from the UI since that information may not even
exist depending on which backend is used (even though there is only one now).