Skip to content

[SPARK-17243] [Web UI] Spark 2.0 History Server won't load with very large application history #14835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

ajbozarth
Copy link
Member

@ajbozarth ajbozarth commented Aug 26, 2016

What changes were proposed in this pull request?

With the new History Server the summary page loads the application list via the the REST API, this makes it very slow to impossible to load with large (10K+) application history. This pr fixes this by adding the spark.history.ui.maxApplications conf to limit the number of applications the History Server displays. This is accomplished using a new optional limit param for the applications api. (Note this only applies to what the summary page displays, all the Application UI's are still accessible if the user knows the App ID and goes to the Application UI directly.)

I've also added a new test for the limit param in HistoryServerSuite.scala

How was this patch tested?

Manual testing and dev/run-tests

@ajbozarth
Copy link
Member Author

ajbozarth commented Aug 26, 2016

I ended up not using the limit param in my solution since getting a conf value to the js function that calls the api was not exactly simple (my biggest issue with switching to js based pages for the ui), but given it's overall usefulness I decided to leave it in this pr as a new feature, If you think I should open a separate JIRA/pr for it I can.

@ajbozarth
Copy link
Member Author

@srowen @steveloughran can you take a look at this, you two were the ones who commented on the JIRA, thanks!

@@ -101,3 +101,4 @@ org.apache.spark.scheduler.ExternalClusterManager
.*\.sql
.Rbuildignore
org.apache.spark.deploy.yarn.security.ServiceCredentialProvider
spark-warehouse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh whoops, I must of committed that by accident, dev/run-tests won't pass without it so I've been adding/removing it every time

@steveloughran
Copy link
Contributor

I'll let other people review the source code in detail, except note that people calling the REST API may want to ask for all entries, even if the web view asks for less.

  1. the rest API is documented in docs/monitoring.md ; changes need to go in there
  2. Tests in HistoryServerSuite;

Test-wise, I'd add something to the end of test("incomplete apps get refreshed"), making another API call on the app list through getContentAndCode and set the ?limit=1 value to verify 0 comes back.

@ajbozarth
Copy link
Member Author

@steveloughran Some follow up and clarifications:
For you comments about people calling the REST API, you are correct that my change limits the applications available through the api on the history server. This was my intended behavior, but if you think it'd be better I can switch to my fallback implementation outlined below.

Alternate implementation that only limits history server and not the api:
I pass the conf value to the js file via a global js var with a getter and setter (I've done this previously with threadDumpEnabled in ExecutorsPage.scala going to executors.js)

  1. I will add this
  2. I added the one test already, for the test you referenced I'm a bit confused at what you're recommending. Why would ?limit=1 return 0? Or are you saying to call it after the last line where it deletes the log, if so how would that test the limit param?

Thanks

@ajbozarth
Copy link
Member Author

So I decided to switch to my alternate implementation after thinking hard about how detrimental limiting the api would be. I also updated the docs with my new conf and api param. I will push these changes once I double check my tests.

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64502 has finished for PR 14835 at commit 6163f13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2016

Test build #64508 has finished for PR 14835 at commit 172dc56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@QQshu1
Copy link

QQshu1 commented Aug 27, 2016

I may have a question ,if I just have a application,and it is very large .It also can't load it by this way. How to solve this kind of scene?

@ajbozarth
Copy link
Member Author

@QQshu1 I'm not sure what problem you're having but this pr is to solve the specific problem that the call to get the application list json when loading the history server takes too long when the application list is too long

<td>Int.MaxValue</td>
<td>
The number of applications to display on the history summary page. Application UIs are
still available even if they are not displayed on the history summary page.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to clarify this, I assume its if you put url to the application directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I'll reword it to use "URL"

@tgravescs
Copy link
Contributor

I need to look at in more details, but at one point (and it looks like this has changed) spark.history.retainedApplications I thought affected number being displayed as well. So in the very least I think we should update the docs to clarify what it is vs this.

@@ -100,6 +100,7 @@ class HistoryServerSuite extends SparkFunSuite with BeforeAndAfter with Matchers
"minDate app list json" -> "applications?minDate=2015-02-10",
"maxDate app list json" -> "applications?maxDate=2015-02-10",
"maxDate2 app list json" -> "applications?maxDate=2015-02-03T16:42:40.000GMT",
"limit app list json" -> "applications?limit=3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with limit=-1? I assume the take will just return empty map but good to try it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit=-1 will actually return the whole list, I just pass the int to take and that's how that function works, I though this was a good default value (which is why it's the default value in the js file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about what would happen if limit was unset —but the existing tests (lines 180-182) all test that

@ajbozarth
Copy link
Member Author

@tgravescs I'll double check but I believe spark.history.retainedApplications now affects the max number of cached applications not the total listed, I was under the same assumption previously and had suggested that workaround in the JIRA

@tgravescs
Copy link
Contributor

yep, I believe you are correct, and if so I think we should just update the docs for it to describe that. If you don't mind doing it along with this that would be great.

@ajbozarth
Copy link
Member Author

@tgravescs pushed those doc updates if you want to take a look, I'm not 100% sure on my wording

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64595 has finished for PR 14835 at commit bbb9718.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajbozarth
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64601 has finished for PR 14835 at commit bbb9718.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

The number of application UIs to retain. If this cap is exceeded, then the oldest
applications will be removed.
The number of application UIs to retain in the cache. If this cap is exceeded, then the oldest
applications will be removed from the cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a nit here but perhaps we should change this to "The number of applications to retain in the cache". I'm wondering if UIs could still be confusing with the number displayed. And perhaps after the last sentence add something like: If an application is not in the cache, it will have to be loaded from disk if its accessed from the UI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that'll help me reword it a bit

@ajbozarth
Copy link
Member Author

@tgravescs I edited the wording, but I feel that the word "UI" needs to be there to make sure they understand that it's the UI being stored not some copy of the application. I though the phrase "UI data" got this across and I choose to move the sentence around to clear any misunderstandings

@SparkQA
Copy link

SparkQA commented Aug 30, 2016

Test build #64664 has finished for PR 14835 at commit 8da22e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

changes look good. thanks!

One interesting thing I found while trying this out is that the number displayed in the table can actually be greater then the maxApplication config due to some applications possibly having multiple attempts. The rest api will return the correct number of applications, it just the datatable may split them based on sorting order. Not a big deal for this, just an observation.

@asfgit asfgit closed this in f7beae6 Aug 30, 2016
@tgravescs
Copy link
Contributor

Merged into 2.1 branch. I can see this being either defect or feature. If you or others want it in 2.0.1 will need a separate pr as there were conflicts with the change due to executor page going to datatables.

@ajbozarth
Copy link
Member Author

Thanks @tgravescs I'll quickly open a pr for 2.0.1 and ping you on it

@ajbozarth ajbozarth deleted the spark17243 branch August 30, 2016 21:42
ajbozarth added a commit to ajbozarth/spark that referenced this pull request Aug 30, 2016
…arge application history

With the new History Server the summary page loads the application list via the the REST API, this makes it very slow to impossible to load with large (10K+) application history. This pr fixes this by adding the `spark.history.ui.maxApplications` conf to limit the number of applications the History Server displays. This is accomplished using a new optional `limit` param for the `applications` api. (Note this only applies to what the summary page displays, all the Application UI's are still accessible if the user knows the App ID and goes to the Application UI directly.)

I've also added a new test for the `limit` param in `HistoryServerSuite.scala`

Manual testing and dev/run-tests

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes apache#14835 from ajbozarth/spark17243.
asfgit pushed a commit that referenced this pull request Aug 31, 2016
…arge application history

## What changes were proposed in this pull request?

back port of #14835 addressing merge conflicts

With the new History Server the summary page loads the application list via the the REST API, this makes it very slow to impossible to load with large (10K+) application history. This pr fixes this by adding the `spark.history.ui.maxApplications` conf to limit the number of applications the History Server displays. This is accomplished using a new optional `limit` param for the `applications` api. (Note this only applies to what the summary page displays, all the Application UI's are still accessible if the user knows the App ID and goes to the Application UI directly.)

I've also added a new test for the `limit` param in `HistoryServerSuite.scala`

## How was this patch tested?

Manual testing and dev/run-tests

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #14886 from ajbozarth/spark17243-branch-2.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants