Commit 22eb6c4
[SPARK-48567][SS][FOLLOWUP] StreamingQuery.lastProgress should return the actual StreamingQueryProgress
This reverts commit d067fc6, which reverted 042804a, essentially brings it back. 042804a failed the 3.5 client <> 4.0 server test, but the test was decided to turned off for cross-version test in #47468
### What changes were proposed in this pull request?
This PR is created after discussion in this closed one: #46886
I was trying to fix a bug (in connect, query.lastProgress doesn't have `numInputRows`, `inputRowsPerSecond`, and `processedRowsPerSecond`), and we reached the conclusion that what purposed in this PR should be the ultimate fix.
In python, for both classic spark and spark connect, the return type of `lastProgress` is `Dict` (and `recentProgress` is `List[Dict]`), but in scala it's the actual `StreamingQueryProgress` object:
https://github.com/apache/spark/blob/1a5d22aa2ffe769435be4aa6102ef961c55b9593/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala#L94-L101
This API discrepancy brings some confusion, like in Scala, users can do `query.lastProgress.batchId`, while in Python they have to do `query.lastProgress["batchId"]`.
This PR makes `StreamingQuery.lastProgress` to return the actual `StreamingQueryProgress` (and `StreamingQuery.recentProgress` to return `List[StreamingQueryProgress]`).
To prevent breaking change, we extend `StreamingQueryProgress` to be a subclass of `dict`, so existing code accessing using dictionary method (e.g. `query.lastProgress["id"]`) is still functional.
### Why are the changes needed?
API parity
### Does this PR introduce _any_ user-facing change?
Yes, now `StreamingQuery.lastProgress` returns the actual `StreamingQueryProgress` (and `StreamingQuery.recentProgress` returns `List[StreamingQueryProgress]`).
### How was this patch tested?
Added unit test
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #47470 from WweiL/bring-back-lastProgress.
Authored-by: Wei Liu <wei.liu@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>1 parent 239d77b commit 22eb6c4
File tree
5 files changed
+227
-99
lines changed- python/pyspark/sql
- connect/streaming
- streaming
- tests/streaming
5 files changed
+227
-99
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
110 | 111 | | |
111 | 112 | | |
112 | 113 | | |
113 | | - | |
| 114 | + | |
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
117 | | - | |
| 118 | + | |
118 | 119 | | |
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
122 | | - | |
| 123 | + | |
123 | 124 | | |
124 | 125 | | |
125 | 126 | | |
126 | 127 | | |
127 | | - | |
| 128 | + | |
128 | 129 | | |
129 | 130 | | |
130 | 131 | | |
| |||
0 commit comments