-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33223][SS][UI]Structured Streaming Web UI state information #30151
Conversation
Could you please paste the screenshots here as this PR addresses the UI change? That would help to see the proposal easier. Thanks in advance! |
Sure, just wanted to check the jenkins tests. When the PR is polished I'll add the UI snapshot. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #130285 has finished for PR 30151 at commit
|
@HeartSaVioR here it is. The operator ID looks a bit ugly on the UI but operators don't have names. |
Thanks for taking the screenshot. General comments:
|
I've just created a sample in-memory app to show the snapshot but going forward we need a more sophisticated app (like the suggested stream-stream join) with longer execution time.
Will create such an app instead of the actual in-memory one and make tests w/ it. How do you exactly mean separate? |
I meant having "accumulated" graphs across multiple state stores vs having graphs per state store. If you use stream-stream join, 12 (6 * 2) graphs will come up, and to see the overall memory usage end users have to accumulate these values by theirselves. Having graphs per state store may be helpful on stream-stream join when there's a skew between left side and right side (either volume of the inputs or difference on evict condition), but probably can be hidden by default and shown on demand of "details". (separate page?) Btw I guess loadedMapCacheHitCount graph can be dropped unless on demand, as if things are working without crash or Spark's bug it will always increment properly. |
Let me create a stream-stream join app to test and we can discuss the details what/how/where to aggregate.
I agree, it would be good to show a summary but independent graph also needed to see which one is problematic
Yeah, having 3-4 operator would make the UI horror. I'll start to experiment w/ separate page per operator approach.
|
If we would like to enable/disable graphs, probably checkbox would be better option. Some of Spark UI pages already work like the way. |
Are these checkboxes dynamically generated? Custom metrics are custom so anything can be added there. So unless one know the potential values in advance it's hard to put a checkbox. |
Ah you're right. That said, we may need to drop graphs for custom metrics as other state store providers wouldn't have them. |
Based on our discussion I've created SPARK-33287 since we need some kind of exclude mechanism to implement it. |
I've just created an example stream-stream join test app to ease the testing: https://github.com/gaborgsomogyi/spark-stream-stream-join-test |
Test build #130409 has finished for PR 30151 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Switched to aggregated values. I think the approach is more useful from user perspective so we can go on w/ this. WDYT? |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #130412 has finished for PR 30151 at commit
|
The UI change looks OK to me. I'll probably apply the PR to my end and play with. Btw, do you have some TODOs for getting rid of WIP? Just to determine the time to start review in codebase. |
Would like to do a review on my code and execute some extra tests (1-2 hours endurance, etc). Hope I can remove the WIP today. |
I've executed
applications for an hour. The response time of the UI is stable. |
Kubernetes integration test starting |
Test build #130455 has finished for PR 30151 at commit
|
Kubernetes integration test status success |
Test build #130459 has finished for PR 30151 at commit
|
cc. @sarutak as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, only minors. I manually verified with queries:
- no stateful operator
- time window query
- query in SPARK-33259, which demonstrate the correctness issue on multiple stream-stream joins - dropped rows are presented in the graph.
For the case of 3), probably better to have details page which show these graphs per stateful operator so that which operator has dropped rows, but that can be done as separate JIRA issue (and separate PR).
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
Btw, thinking out loud, would it be helpful if we have a table showing General monitoring would be done with SS UI page, and in this page we probably figure out the problematic batches which we would like to look into details. The connection is lost here and as of now we need to find relevant information manually. |
Without super deep consideration it makes sense. This page adds only high level information about the states. When it looks bad from high level then users are interested in more granular information to find out the root cause. One step on this road could be to show state information per state basis (not yet created a jira for this). When we added custom metrics + watermark info we can do experiments what is the most valuable information to users in finding out unhealthy query root causes. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Looks OK to me. If you could come up with some sort of testing even better, but I'm not sure that's something we could require on the UI change, as the comment said we don't have effective UI tests for SS - #30151 (comment) |
Test build #130563 has finished for PR 30151 at commit
|
Thanks for adding this. This is pretty useful. Could you add some simple tests in https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/UISeleniumSuite.scala#L125 to make sure we at least touch these codes in our tests? |
Sure, started to have a look what tests are possible to add. Doing some other experiments so will take some time. |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130789 has finished for PR 30151 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130795 has finished for PR 30151 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
As I see simple test is added, merging to master. If someone wants to have detailed tests please volunteer and submit a follow-up PR.
Thanks all for reviewing and thanks @gaborgsomogyi for the contribution. Merged to master. |
Thanks all for taking care! |
<tr> | ||
<td style="vertical-align: middle;"> | ||
<div style="width: 160px;"> | ||
<div><strong>Aggregated Number Of State Rows Dropped By Watermark {SparkUIUtils.tooltip("Aggregated number of state rows dropped by watermark.", "right")}</strong></div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out during work on mine - the rows dropped by watermark are not from state. It's a bit confusing, but they're input rows for "stateful operators". I'll make a follow-up PR to correct this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HeartSaVioR @gaborgsomogyi can you please clarify why Structured Streaming UI is rendered as a static page instead of the data being exposed as an API like we do for DStream monitoring? |
What changes were proposed in this pull request?
Structured Streaming UI is not containing state information. In this PR I've added it.
Why are the changes needed?
Missing state information.
Does this PR introduce any user-facing change?
Additional UI elements appear.
How was this patch tested?
Existing unit tests + manual test.