[WIP][SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thrift server #35178

yaooqinn · 2022-01-12T10:54:17Z

What changes were proposed in this pull request?

In this PR, we add a config to control whether thrift server operations will clear shuffle dependencies eagerly after being executed.

In long-running applications, like thrift server, with large driver JVMs, where there is little memory pressure on the driver, the driver gc may happen very occasionally or not at all. Not cleaning at all may lead to executors running out of disk space after a while.

Why are the changes needed?

For the thrift server, it currently relies on a periodical system.gc to release shuffle meta/data e.t.c, which is not efficient enough towards its usage scenario as the gc does not always occur immediately. Unclean data may cause thrift server memory issues, like OOM, or disk issues on the work side, like no space left for the device.

Does this PR introduce any user-facing change?

added new conf

How was this patch tested?

…ift server

HyukjinKwon · 2022-01-13T09:15:52Z

FWIW, there's a bug on updating the status in the check (#35179 (comment)). We should check the actual tests e.g., at https://github.com/yaooqinn/spark/runs/4798556788

mridulm · 2022-01-13T18:15:43Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

      }
+      if (cleanShuffleDeps && rdd != null) {
+        rdd.cleanShuffleDependencies()
+        rdd = null


When THRIFTSERVER_INCREMENTAL_COLLECT is true, we will end up cleaning up dependencies before we have used up the data from df.
Wrap iter (IterableFetchIterator) as a CompletionIterator and move this there ?

thanks for the input

mridulm · 2022-01-13T18:16:24Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

      HiveThriftServer2.eventManager.onStatementParsed(statementId,
-        result.queryExecution.toString())
+        df.queryExecution.toString())
+      rdd = df.rdd


df.rdd has tripped me in the past - the fact that it actually ends up executing the prefix dag was surprising (some team was leveraging on df.rdd.partitions.length to make some decisions).

the changes here cannot trigger the eager clean for all cases, for example, CTAS, where the S contains a shuffle, will not be able to be cleared. The possible and proper place may be SparkPlan

yaooqinn · 2022-01-14T06:23:44Z

Still in POC, any inputs are welcome :)

github-actions · 2022-04-25T00:16:21Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

[SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thr…

e59a285

…ift server

github-actions bot added the SQL label Jan 12, 2022

yaooqinn added 3 commits January 12, 2022 19:24

[SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thr…

3511349

…ift server

[SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thr…

a06ff46

…ift server

[SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thr…

c2f90bc

…ift server

yaooqinn self-assigned this Jan 12, 2022

yaooqinn changed the title ~~[SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thrift server~~ [WIP][SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thrift server Jan 13, 2022

yaooqinn marked this pull request as draft January 13, 2022 03:35

ci

7b2f501

mridulm reviewed Jan 13, 2022

View reviewed changes

github-actions bot added the Stale label Apr 25, 2022

github-actions bot closed this Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thrift server #35178

[WIP][SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thrift server #35178

Uh oh!

yaooqinn commented Jan 12, 2022 •

edited

Loading

Uh oh!

HyukjinKwon commented Jan 13, 2022

Uh oh!

mridulm Jan 13, 2022 •

edited

Loading

Uh oh!

yaooqinn Jan 14, 2022

Uh oh!

mridulm Jan 13, 2022 •

edited

Loading

Uh oh!

yaooqinn Jan 14, 2022

Uh oh!

yaooqinn commented Jan 14, 2022

Uh oh!

github-actions bot commented Apr 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP][SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thrift server #35178

[WIP][SPARK-37877][SQL] Support clear shuffle dependencies eagerly for thrift server #35178

Uh oh!

Conversation

yaooqinn commented Jan 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon commented Jan 13, 2022

Uh oh!

mridulm Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Jan 14, 2022

Choose a reason for hiding this comment

Uh oh!

mridulm Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Jan 14, 2022

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Jan 14, 2022

Uh oh!

github-actions bot commented Apr 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaooqinn commented Jan 12, 2022 •

edited

Loading

mridulm Jan 13, 2022 •

edited

Loading

mridulm Jan 13, 2022 •

edited

Loading