Skip to content

[SPARK-36183][SQL][FOLLOWUP] Fix push down limit 1 through Aggregate #35286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[SPARK-36183][SQL][FOLLOWUP] Fix push down limit 1 through Aggregate #35286

wants to merge 1 commit into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jan 23, 2022

What changes were proposed in this pull request?

Use Aggregate.aggregateExpressions instead of Aggregate.output when pushing down limit 1 through Aggregate.

For example:

spark.range(10).selectExpr("id % 5 AS a", "id % 5 AS b").write.saveAsTable("t1")
spark.sql("SELECT a, b, a AS alias FROM t1 GROUP BY a, b LIMIT 1").explain(true)

Before this pr:

== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- !Project [a#227L, b#228L, alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet

After this pr:

== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- Project [a#227L, b#228L, a#227L AS alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Jan 23, 2022
@wangyum
Copy link
Member Author

wangyum commented Jan 24, 2022

cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9b12571 Jan 25, 2022
@wangyum wangyum deleted the SPARK-36183-2 branch January 25, 2022 02:04
wangyum added a commit that referenced this pull request May 26, 2023
…861)

* [SPARK-36183][SQL][FOLLOWUP] Fix push down limit 1 through Aggregate

### What changes were proposed in this pull request?

Use `Aggregate.aggregateExpressions` instead of `Aggregate.output` when  pushing down limit 1 through Aggregate.

For example:

```scala
spark.range(10).selectExpr("id % 5 AS a", "id % 5 AS b").write.saveAsTable("t1")
spark.sql("SELECT a, b, a AS alias FROM t1 GROUP BY a, b LIMIT 1").explain(true)
```
Before this pr:
```
== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- !Project [a#227L, b#228L, alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet
```
After this pr:
```
== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- Project [a#227L, b#228L, a#227L AS alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet
```

### Why are the changes needed?

Fix bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #35286 from wangyum/SPARK-36183-2.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

(cherry picked from commit 9b12571)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants