-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17243] [Web UI] Spark 2.0 History Server won't load with very large application history #14835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@srowen @steveloughran can you take a look at this, you two were the ones who commented on the JIRA, thanks! |
@@ -101,3 +101,4 @@ org.apache.spark.scheduler.ExternalClusterManager | |||
.*\.sql | |||
.Rbuildignore | |||
org.apache.spark.deploy.yarn.security.ServiceCredentialProvider | |||
spark-warehouse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh whoops, I must of committed that by accident, dev/run-tests won't pass without it so I've been adding/removing it every time
I'll let other people review the source code in detail, except note that people calling the REST API may want to ask for all entries, even if the web view asks for less.
Test-wise, I'd add something to the end of |
@steveloughran Some follow up and clarifications: Alternate implementation that only limits history server and not the api:
Thanks |
So I decided to switch to my alternate implementation after thinking hard about how detrimental limiting the api would be. I also updated the docs with my new conf and api param. I will push these changes once I double check my tests. |
Test build #64502 has finished for PR 14835 at commit
|
Test build #64508 has finished for PR 14835 at commit
|
I may have a question ,if I just have a application,and it is very large .It also can't load it by this way. How to solve this kind of scene? |
@QQshu1 I'm not sure what problem you're having but this pr is to solve the specific problem that the call to get the application list json when loading the history server takes too long when the application list is too long |
<td>Int.MaxValue</td> | ||
<td> | ||
The number of applications to display on the history summary page. Application UIs are | ||
still available even if they are not displayed on the history summary page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to clarify this, I assume its if you put url to the application directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I'll reword it to use "URL"
I need to look at in more details, but at one point (and it looks like this has changed) spark.history.retainedApplications I thought affected number being displayed as well. So in the very least I think we should update the docs to clarify what it is vs this. |
@@ -100,6 +100,7 @@ class HistoryServerSuite extends SparkFunSuite with BeforeAndAfter with Matchers | |||
"minDate app list json" -> "applications?minDate=2015-02-10", | |||
"maxDate app list json" -> "applications?maxDate=2015-02-10", | |||
"maxDate2 app list json" -> "applications?maxDate=2015-02-03T16:42:40.000GMT", | |||
"limit app list json" -> "applications?limit=3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens with limit=-1? I assume the take will just return empty map but good to try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit=-1 will actually return the whole list, I just pass the int to take
and that's how that function works, I though this was a good default value (which is why it's the default value in the js file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried about what would happen if limit was unset —but the existing tests (lines 180-182) all test that
@tgravescs I'll double check but I believe |
yep, I believe you are correct, and if so I think we should just update the docs for it to describe that. If you don't mind doing it along with this that would be great. |
@tgravescs pushed those doc updates if you want to take a look, I'm not 100% sure on my wording |
Test build #64595 has finished for PR 14835 at commit
|
retest this please |
Test build #64601 has finished for PR 14835 at commit
|
The number of application UIs to retain. If this cap is exceeded, then the oldest | ||
applications will be removed. | ||
The number of application UIs to retain in the cache. If this cap is exceeded, then the oldest | ||
applications will be removed from the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of a nit here but perhaps we should change this to "The number of applications to retain in the cache". I'm wondering if UIs could still be confusing with the number displayed. And perhaps after the last sentence add something like: If an application is not in the cache, it will have to be loaded from disk if its accessed from the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that'll help me reword it a bit
@tgravescs I edited the wording, but I feel that the word "UI" needs to be there to make sure they understand that it's the UI being stored not some copy of the application. I though the phrase "UI data" got this across and I choose to move the sentence around to clear any misunderstandings |
Test build #64664 has finished for PR 14835 at commit
|
changes look good. thanks! One interesting thing I found while trying this out is that the number displayed in the table can actually be greater then the maxApplication config due to some applications possibly having multiple attempts. The rest api will return the correct number of applications, it just the datatable may split them based on sorting order. Not a big deal for this, just an observation. |
Merged into 2.1 branch. I can see this being either defect or feature. If you or others want it in 2.0.1 will need a separate pr as there were conflicts with the change due to executor page going to datatables. |
Thanks @tgravescs I'll quickly open a pr for 2.0.1 and ping you on it |
…arge application history With the new History Server the summary page loads the application list via the the REST API, this makes it very slow to impossible to load with large (10K+) application history. This pr fixes this by adding the `spark.history.ui.maxApplications` conf to limit the number of applications the History Server displays. This is accomplished using a new optional `limit` param for the `applications` api. (Note this only applies to what the summary page displays, all the Application UI's are still accessible if the user knows the App ID and goes to the Application UI directly.) I've also added a new test for the `limit` param in `HistoryServerSuite.scala` Manual testing and dev/run-tests Author: Alex Bozarth <ajbozart@us.ibm.com> Closes apache#14835 from ajbozarth/spark17243.
…arge application history ## What changes were proposed in this pull request? back port of #14835 addressing merge conflicts With the new History Server the summary page loads the application list via the the REST API, this makes it very slow to impossible to load with large (10K+) application history. This pr fixes this by adding the `spark.history.ui.maxApplications` conf to limit the number of applications the History Server displays. This is accomplished using a new optional `limit` param for the `applications` api. (Note this only applies to what the summary page displays, all the Application UI's are still accessible if the user knows the App ID and goes to the Application UI directly.) I've also added a new test for the `limit` param in `HistoryServerSuite.scala` ## How was this patch tested? Manual testing and dev/run-tests Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #14886 from ajbozarth/spark17243-branch-2.0.
What changes were proposed in this pull request?
With the new History Server the summary page loads the application list via the the REST API, this makes it very slow to impossible to load with large (10K+) application history. This pr fixes this by adding the
spark.history.ui.maxApplications
conf to limit the number of applications the History Server displays. This is accomplished using a new optionallimit
param for theapplications
api. (Note this only applies to what the summary page displays, all the Application UI's are still accessible if the user knows the App ID and goes to the Application UI directly.)I've also added a new test for the
limit
param inHistoryServerSuite.scala
How was this patch tested?
Manual testing and dev/run-tests