Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KYLIN-4925 Use Spark3 for Kylin 4.0 #1601

Merged
merged 1 commit into from
Jun 26, 2021

Conversation

xiacongling
Copy link

@xiacongling xiacongling commented Mar 4, 2021

Proposed changes

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.

Types of changes

What types of changes does your code introduce to Kylin?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have create an issue on Kylin's jira, and have described the bug/feature there in detail
  • Commit messages in my PR start with the related jira ID, like "KYLIN-0000 Make Kylin project open-source"
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If this change need a document change, I will prepare another pr against the document branch
  • Any dependent changes have been merged

Further comments

Tested for Spark 2.4.7 and Spark 3.1.1. We make it compatible to use either Spark 2.4 (default) or Spark 3.1. When build with spark 3.1, active profile with mvn -Pspark3.

@hit-lacus hit-lacus requested a review from zzcclp March 11, 2021 06:15
@hit-lacus hit-lacus added the Kylin 4.X Patch to Parquet Storgae(kylin-on-parquet-v2 branch). label Mar 11, 2021
@hit-lacus hit-lacus added the Not 3.X Not applicable for HBase Storage label Apr 21, 2021
@zzcclp
Copy link
Contributor

zzcclp commented May 12, 2021

@xiacongling Thank you for your contribution, I will review this PR ASAP.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2021

Codecov Report

❗ No coverage uploaded for pull request base (kylin-on-parquet-v2@25080df). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                   @@
##             kylin-on-parquet-v2    #1601   +/-   ##
======================================================
  Coverage                       ?   24.29%           
  Complexity                     ?     4649           
======================================================
  Files                          ?     1152           
  Lines                          ?    65208           
  Branches                       ?     9363           
======================================================
  Hits                           ?    15842           
  Misses                         ?    47717           
  Partials                       ?     1649           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25080df...624b100. Read the comment docs.

broadcastSide(buildLeft, buildRight, left, right)
}

def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to implementation this strategy according to the JoinSelection in Spark 3.1, the implementation in Spark 3.1 is very different from the one in Spark 2.4.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzcclp Hi, I've done the rewrite, but a little different from the 2.4 implementation. I've found in 2.4 implementation that it acquires memory twice for both left and right side. For example:

case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right)
  if canBroadcastBySizes(joinType, left, right) =>
  val buildSide = broadcastSideBySizes(joinType, left, right)
  Seq(joins.BroadcastHashJoinExec(
    leftKeys, rightKeys, joinType, buildSide, condition, planLater(left), planLater(right)))

canBroadcastBySizes and broadcastSideBySizes both tries to acquire memory for two sides. Is that a mistake? Should we add a method like tryAcquireMemory in JoinMemoryManager for checks and acquire memory only if we decide to create BHJ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a mistake. Thanks for your reporting about this issue, I will check this later.

@zzcclp
Copy link
Contributor

zzcclp commented Jun 26, 2021

LGTM, thank you for your contribution, this feature is great.

@zzcclp zzcclp merged commit ed241cd into apache:kylin-on-parquet-v2 Jun 26, 2021
@pan3793
Copy link
Member

pan3793 commented Jun 27, 2021

Why put classes of package org.apache.spark.memory into folder org/apache/spark/monitor?

@zzcclp
Copy link
Contributor

zzcclp commented Jun 27, 2021

@pan3793 Thanks for your mentioning, I will check later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kylin 4.X Patch to Parquet Storgae(kylin-on-parquet-v2 branch). Not 3.X Not applicable for HBase Storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants