-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KYLIN-4925 Use Spark3 for Kylin 4.0 #1601
Conversation
1908be9
to
79766d1
Compare
@xiacongling Thank you for your contribution, I will review this PR ASAP. |
...park-common/src/main/java/org/apache/kylin/engine/spark/common/util/KylinDateTimeUtils.scala
Outdated
Show resolved
Hide resolved
...park-common/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimestampDiffImpl.scala
Outdated
Show resolved
Hide resolved
...ark-project/kylin-spark-common/src/main/scala/org/apache/spark/utils/KylinReflectUtils.scala
Outdated
Show resolved
Hide resolved
...ark-project/kylin-spark-common/src/main/scala/org/apache/spark/utils/KylinReflectUtils.scala
Outdated
Show resolved
Hide resolved
.../kylin-spark-common/src/main/spark24/org/apache/spark/monitor/MonitorExecutorExtension.scala
Outdated
Show resolved
Hide resolved
.../kylin-spark-common/src/main/spark24/org/apache/spark/monitor/MonitorExecutorExtension.scala
Outdated
Show resolved
Hide resolved
...n-spark-common/src/main/spark31/org/apache/spark/sql/execution/KylinFileSourceScanExec.scala
Show resolved
Hide resolved
...n-spark-common/src/main/spark31/org/apache/spark/sql/execution/KylinFileSourceScanExec.scala
Show resolved
Hide resolved
.../kylin-spark-common/src/main/spark31/org/apache/spark/sql/execution/KylinJoinSelection.scala
Outdated
Show resolved
Hide resolved
.../kylin-spark-common/src/main/spark31/org/apache/spark/sql/execution/KylinJoinSelection.scala
Outdated
Show resolved
Hide resolved
...t/kylin-spark-common/src/main/spark31/org/apache/spark/sql/hive/utils/QueryMetricUtils.scala
Outdated
Show resolved
Hide resolved
...t/kylin-spark-common/src/main/spark31/org/apache/spark/sql/hive/utils/QueryMetricUtils.scala
Outdated
Show resolved
Hide resolved
...t/kylin-spark-engine/src/main/scala/org/apache/kylin/engine/spark/job/CuboidAggregator.scala
Outdated
Show resolved
Hide resolved
...t/kylin-spark-engine/src/main/scala/org/apache/kylin/query/runtime/ExpressionConverter.scala
Outdated
Show resolved
Hide resolved
...t/kylin-spark-engine/src/main/scala/org/apache/kylin/query/runtime/ExpressionConverter.scala
Outdated
Show resolved
Hide resolved
...spark-metadata/src/main/spark24/org/apache/kylin/engine/spark/cross/CrossDateTimeUtils.scala
Show resolved
Hide resolved
...spark-metadata/src/main/spark31/org/apache/kylin/engine/spark/cross/CrossDateTimeUtils.scala
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## kylin-on-parquet-v2 #1601 +/- ##
======================================================
Coverage ? 24.29%
Complexity ? 4649
======================================================
Files ? 1152
Lines ? 65208
Branches ? 9363
======================================================
Hits ? 15842
Misses ? 47717
Partials ? 1649 Continue to review full report at Codecov.
|
...ylin-spark-common/src/main/spark24/org/apache/spark/sql/execution/datasource/FilterExt.scala
Show resolved
Hide resolved
...-spark-engine/src/test/scala/org/apache/kylin/engine/spark/builder/TestCreateFlatTable.scala
Outdated
Show resolved
Hide resolved
broadcastSide(buildLeft, buildRight, left, right) | ||
} | ||
|
||
def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to implementation this strategy according to the JoinSelection
in Spark 3.1, the implementation in Spark 3.1 is very different from the one in Spark 2.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzcclp Hi, I've done the rewrite, but a little different from the 2.4 implementation. I've found in 2.4 implementation that it acquires memory twice for both left and right side. For example:
case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right)
if canBroadcastBySizes(joinType, left, right) =>
val buildSide = broadcastSideBySizes(joinType, left, right)
Seq(joins.BroadcastHashJoinExec(
leftKeys, rightKeys, joinType, buildSide, condition, planLater(left), planLater(right)))
canBroadcastBySizes
and broadcastSideBySizes
both tries to acquire memory for two sides. Is that a mistake? Should we add a method like tryAcquireMemory
in JoinMemoryManager
for checks and acquire memory only if we decide to create BHJ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a mistake. Thanks for your reporting about this issue, I will check this later.
9850eb1
to
6a4b2d8
Compare
...-spark-engine/src/test/scala/org/apache/kylin/engine/spark/builder/TestCreateFlatTable.scala
Show resolved
Hide resolved
5e655c8
to
e0a128a
Compare
e0a128a
to
624b100
Compare
LGTM, thank you for your contribution, this feature is great. |
Why put classes of |
@pan3793 Thanks for your mentioning, I will check later. |
Proposed changes
Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.
Types of changes
What types of changes does your code introduce to Kylin?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.document
branchFurther comments
Tested for Spark 2.4.7 and Spark 3.1.1. We make it compatible to use either Spark 2.4 (default) or Spark 3.1. When build with spark 3.1, active profile with
mvn -Pspark3
.