Commit dce8908
[SPARK-26549][PYSPARK] Fix for python worker reuse take no effect for parallelize lazy iterable range
## What changes were proposed in this pull request?
During the follow-up work(apache#23435) for PySpark worker reuse scenario, we found that the worker reuse takes no effect for `sc.parallelize(xrange(...))`. It happened because of the specialize rdd.parallelize logic for xrange(introduced in apache#3264) generated data by lazy iterable range, which don't need to use the passed-in iterator. But this will break the end of stream checking in python worker and finally cause worker reuse takes no effect. See more details in [SPARK-26549](https://issues.apache.org/jira/browse/SPARK-26549) description.
We fix this by force using the passed-in iterator.
## How was this patch tested?
New UT in test_worker.py.
Closes apache#23470 from xuanyuanking/SPARK-26549.
Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>1 parent 1a794bf commit dce8908
2 files changed
+19
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
498 | 498 | | |
499 | 499 | | |
500 | 500 | | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
501 | 509 | | |
502 | 510 | | |
503 | 511 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
148 | 158 | | |
149 | 159 | | |
150 | 160 | | |
| |||
0 commit comments