Commit 26ae9e9
committed
[SPARK-36559][SQL][PYTHON] Create plans dedicated to distributed-sequence index for optimization
### What changes were proposed in this pull request?
This PR proposes to move distributed-sequence index implementation to SQL plan to leverage optimizations such as column pruning.
```python
import pyspark.pandas as ps
ps.set_option('compute.default_index_type', 'distributed-sequence')
ps.range(10).id.value_counts().to_frame().spark.explain()
```
**Before:**
```bash
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Sort [count#51L DESC NULLS LAST], true, 0
+- Exchange rangepartitioning(count#51L DESC NULLS LAST, 200), ENSURE_REQUIREMENTS, [id=#70]
+- HashAggregate(keys=[id#37L], functions=[count(1)], output=[__index_level_0__#48L, count#51L])
+- Exchange hashpartitioning(id#37L, 200), ENSURE_REQUIREMENTS, [id=#67]
+- HashAggregate(keys=[id#37L], functions=[partial_count(1)], output=[id#37L, count#63L])
+- Project [id#37L]
+- Filter atleastnnonnulls(1, id#37L)
+- Scan ExistingRDD[__index_level_0__#36L,id#37L]
# ^^^ Base DataFrame created by the output RDD from zipWithIndex (and checkpointed)
```
**After:**
```bash
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Sort [count#275L DESC NULLS LAST], true, 0
+- Exchange rangepartitioning(count#275L DESC NULLS LAST, 200), ENSURE_REQUIREMENTS, [id=#174]
+- HashAggregate(keys=[id#258L], functions=[count(1)])
+- HashAggregate(keys=[id#258L], functions=[partial_count(1)])
+- Filter atleastnnonnulls(1, id#258L)
+- Range (0, 10, step=1, splits=16)
# ^^^ Removed the Spark job execution for `zipWithIndex`
```
### Why are the changes needed?
To leverage optimization of SQL engine and avoid unnecessary shuffle to create default index.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unittests were added. Also, this PR will test all unittests in pandas API on Spark after switching the default index implementation to `distributed-sequence`.
Closes #33807 from HyukjinKwon/SPARK-36559.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 93cec49)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>1 parent 5463caa commit 26ae9e9
File tree
8 files changed
+121
-39
lines changed- python/pyspark/pandas/tests
- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst
- analysis
- optimizer
- plans/logical
- test/scala/org/apache/spark/sql/catalyst/optimizer
- core/src/main/scala/org/apache/spark/sql
- execution
- python
8 files changed
+121
-39
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5160 | 5160 | | |
5161 | 5161 | | |
5162 | 5162 | | |
5163 | | - | |
5164 | | - | |
5165 | | - | |
5166 | | - | |
5167 | | - | |
5168 | | - | |
5169 | | - | |
5170 | | - | |
5171 | | - | |
5172 | | - | |
5173 | | - | |
5174 | | - | |
5175 | | - | |
5176 | | - | |
5177 | | - | |
5178 | | - | |
5179 | | - | |
5180 | | - | |
5181 | | - | |
5182 | | - | |
| 5163 | + | |
| 5164 | + | |
| 5165 | + | |
| 5166 | + | |
| 5167 | + | |
| 5168 | + | |
| 5169 | + | |
| 5170 | + | |
| 5171 | + | |
| 5172 | + | |
| 5173 | + | |
| 5174 | + | |
| 5175 | + | |
| 5176 | + | |
| 5177 | + | |
| 5178 | + | |
| 5179 | + | |
| 5180 | + | |
| 5181 | + | |
5183 | 5182 | | |
5184 | 5183 | | |
5185 | 5184 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
228 | 232 | | |
229 | 233 | | |
230 | 234 | | |
| |||
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
794 | 794 | | |
795 | 795 | | |
796 | 796 | | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
797 | 802 | | |
798 | 803 | | |
799 | 804 | | |
| |||
Lines changed: 17 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
452 | 452 | | |
453 | 453 | | |
454 | 454 | | |
455 | | - | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
456 | 462 | | |
Lines changed: 5 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3514 | 3514 | | |
3515 | 3515 | | |
3516 | 3516 | | |
3517 | | - | |
3518 | | - | |
3519 | | - | |
3520 | | - | |
3521 | | - | |
3522 | | - | |
3523 | | - | |
3524 | | - | |
3525 | | - | |
3526 | | - | |
3527 | | - | |
3528 | | - | |
3529 | | - | |
3530 | | - | |
3531 | | - | |
3532 | | - | |
3533 | | - | |
3534 | | - | |
| 3517 | + | |
| 3518 | + | |
| 3519 | + | |
| 3520 | + | |
| 3521 | + | |
3535 | 3522 | | |
3536 | 3523 | | |
3537 | 3524 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
712 | 712 | | |
713 | 713 | | |
714 | 714 | | |
| 715 | + | |
| 716 | + | |
715 | 717 | | |
716 | 718 | | |
717 | 719 | | |
| |||
Lines changed: 62 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
0 commit comments