Skip to content

[SPARK-30765][SQL] Refine base operator abstraction code style #27511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

Eric5553
Copy link
Contributor

@Eric5553 Eric5553 commented Feb 9, 2020

What changes were proposed in this pull request?

When doing base operator abstraction work, we found there are still some code snippet is inconsistent with other abstraction code style. This PR addressed following two code refactor cases.

Case 1 Override keyword missed for some fields in derived classes. The compiler will not capture it if we rename some fields in the future.

Case 2 Inconsistent abstract class field definition. The updated style will simplify derived class definition, e.g. EvalPythonExec WindowExecBase

Why are the changes needed?

Improve the code style consistency and code quality

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

@Eric5553
Copy link
Contributor Author

Eric5553 commented Feb 9, 2020

cc @gatorsmile @maropu

@Eric5553 Eric5553 changed the title Refine base operator abstraction code style [SPARK-30765][SQL] Refine base operator abstraction code style Feb 9, 2020
@maropu
Copy link
Member

maropu commented Feb 9, 2020

ok to test

@maropu
Copy link
Member

maropu commented Feb 9, 2020

cc: @cloud-fan @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118101 has finished for PR 27511 at commit 87c2a32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SubqueryExec(override val name: String, override val child: SparkPlan)
  • case class ReusedSubqueryExec(override val child: BaseSubqueryExec)
  • case class CollectLimitExec(override val limit: Int, child: SparkPlan) extends LimitExec
  • case class CollectTailExec(override val limit: Int, child: SparkPlan) extends LimitExec
  • case class LocalLimitExec(override val limit: Int, child: SparkPlan) extends BaseLimitExec
  • case class GlobalLimitExec(override val limit: Int, child: SparkPlan) extends BaseLimitExec
  • case class ArrowEvalPythonExec(
  • case class BatchEvalPythonExec(
  • abstract class EvalPythonExec extends UnaryExecNode
  • case class StreamingLocalLimitExec(override val limit: Int, child: SparkPlan)
  • abstract class WindowExecBase extends UnaryExecNode

partitionSpec: Seq[Expression],
orderSpec: Seq[SortOrder],
override val windowExpression: Seq[NamedExpression],
override val partitionSpec: Seq[Expression],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is override val required here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question. case class means everything is val so this is just adding override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon @cloud-fan Thanks! I dropped most changes based on this comment #27368 (comment). So current refactor rules are:

  1. If abstract class or trait defined with def, then no override needed in derived class.
  2. Change HashJoin definition to def to avoid override in derived classes.
  3. Remove abstract class constructer part of EvalPythonExec WindowExecBase to sync with other operator code style.

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118240 has finished for PR 27511 at commit cc709a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ReusedSubqueryExec(child: BaseSubqueryExec)
  • case class CollectTailExec(limit: Int, child: SparkPlan) extends LimitExec
  • case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec
  • case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118245 has finished for PR 27511 at commit e8a6a4e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118239 has finished for PR 27511 at commit 7ad8d83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SubqueryExec(name: String, child: SparkPlan)
  • case class CollectLimitExec(limit: Int, child: SparkPlan) extends LimitExec
  • case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], resultAttrs: Seq[Attribute], child: SparkPlan,
  • case class BatchEvalPythonExec(udfs: Seq[PythonUDF], resultAttrs: Seq[Attribute], child: SparkPlan)
  • case class StreamingLocalLimitExec(limit: Int, child: SparkPlan)

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118242 has finished for PR 27511 at commit 9511b34.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 12, 2020

Looks fine now and I'll leave this to the others. @gatorsmile @cloud-fan @HyukjinKwon

@Eric5553 Eric5553 force-pushed the BaseClassAbstraction branch from e8a6a4e to d07cc1b Compare February 12, 2020 02:43
@SparkQA
Copy link

SparkQA commented Feb 12, 2020

Test build #118272 has finished for PR 27511 at commit d07cc1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

partitionSpec: Seq[Expression],
orderSpec: Seq[SortOrder],
child: SparkPlan) extends UnaryExecNode {
abstract class WindowExecBase extends UnaryExecNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can switch it to trait now.

Copy link
Contributor Author

@Eric5553 Eric5553 Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, updated in 364f5a7. Thanks!

@@ -57,8 +57,9 @@ import org.apache.spark.util.Utils
* there should be always some rows buffered in the socket or Python process, so the pulling from
* RowQueue ALWAYS happened after pushing into it.
*/
abstract class EvalPythonExec(udfs: Seq[PythonUDF], resultAttrs: Seq[Attribute], child: SparkPlan)
extends UnaryExecNode {
abstract class EvalPythonExec extends UnaryExecNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one too. seems we can switch to trait.

Copy link
Contributor Author

@Eric5553 Eric5553 Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 364f5a7, thanks!

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too otherwise.

@@ -102,7 +102,7 @@ case class RowDataSourceScanExec(
filters: Set[Filter],
handledFilters: Set[Filter],
rdd: RDD[InternalRow],
@transient relation: BaseRelation,
@transient override val relation: BaseRelation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need override?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I forgot to update here. Per previous discussion, we should replace base definition to 'def' instead of adding these overrides.

@@ -158,7 +158,7 @@ case class RowDataSourceScanExec(
* @param tableIdentifier identifier for the table in the metastore.
*/
case class FileSourceScanExec(
@transient relation: HadoopFsRelation,
@transient override val relation: HadoopFsRelation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118919 has finished for PR 27511 at commit 364f5a7.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait EvalPythonExec extends UnaryExecNode
  • trait WindowExecBase extends UnaryExecNode

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118925 has finished for PR 27511 at commit 839f7b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@Eric5553
Copy link
Contributor Author

Thanks a lot! @HyukjinKwon @cloud-fan @maropu

@Eric5553 Eric5553 deleted the BaseClassAbstraction branch March 13, 2020 06:51
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
When doing base operator abstraction work, we found there are still some code snippet is  inconsistent with other abstraction code style. This PR addressed following two code refactor cases.

**Case 1** Override keyword missed for some fields in derived classes. The compiler will not capture it if we rename some fields in the future.

**Case 2** Inconsistent abstract class field definition. The updated style will simplify derived class definition, e.g. `EvalPythonExec` `WindowExecBase`

### Why are the changes needed?
Improve the code style consistency and code quality

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests

Closes apache#27511 from Eric5553/BaseClassAbstraction.

Authored-by: Eric Wu <492960551@qq.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants