Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set spark.sql.execution.topKSortFallbackThreshold to a reasonable value #1018

Closed
Tracked by #832
yaooqinn opened this issue Sep 3, 2021 · 3 comments
Closed
Tracked by #832
Labels
good first issue beginner skills required help wanted kind:feature Feature request
Milestone

Comments

@yaooqinn
Copy link
Member

yaooqinn commented Sep 3, 2021

1. Describe the feature

In apache/spark#33904, I faced a performance issue with the topK scenario where K is very large which can be avoided by setting spark.sql.execution.topKSortFallbackThreshold less than the K here.

For Kyuubi users who run SQL only, improper topK SQLs will hold engines for quite a long time and they don't know why.

With a proper value for spark.sql.execution.topKSortFallbackThreshold is more suitable for Kyuubi's cases

I'd suggest this value be set to 10000

2. Motivation

3. Describe the solution

4. Additional context

@yaooqinn yaooqinn added help wanted kind:feature Feature request good first issue beginner skills required labels Sep 3, 2021
@yaooqinn yaooqinn added this to the v1.4.0 milestone Sep 3, 2021
@byyue
Copy link
Contributor

byyue commented Sep 7, 2021

Hi, could you please elaborate a little bit more on how to implement this?

@yaooqinn
Copy link
Member Author

yaooqinn commented Sep 8, 2021

diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala
index 6968e07c..9c2832c5 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala
@@ -96,6 +96,7 @@ object SparkSQLEngine extends Logging {

   def createSpark(): SparkSession = {
     val sparkConf = new SparkConf()
+    sparkConf.setIfMissing("spark.sql.execution.topKSortFallbackThreshold", "10000")
     sparkConf.setIfMissing("spark.sql.legacy.castComplexTypesToString.enabled", "true")
     sparkConf.setIfMissing("spark.master", "local")
     sparkConf.setIfMissing("spark.ui.port", "0")

@byyue
Copy link
Contributor

byyue commented Sep 8, 2021

Thanks! I think spark.sql.execution.topKSortFallbackThreshold is now a property name of Spark Runtime SQL Configuration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue beginner skills required help wanted kind:feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants