forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-44839][SS][CONNECT] Better Error Logging when user tries to se…
…rialize spark session ### What changes were proposed in this pull request? Add a new error with detailed message when a user tries to access spark session and dataframe created using local spark session, in streaming spark connect `foreach`, `foreachBatch` and `StreamingQueryListener`. Update: per reviewer's request, added a new error class `PySparkPicklingError`. Also move `UDTF_SERIALIZATION_ERROR` to the new class ### Why are the changes needed? Better error logging for the breaking change introduced in streaming spark connect. ### Does this PR introduce _any_ user-facing change? Yes, before users can only see this non-informative error when they access a local spark session in their streaming connect related functions: ``` Traceback (most recent call last): File "/home/wei.liu/oss-spark/python/pyspark/serializers.py", line 459, in dumps return cloudpickle.dumps(obj, pickle_protocol) File "/home/wei.liu/oss-spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps cp.dump(obj) File "/home/wei.liu/oss-spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 632, in dump return Pickler.dump(self, obj) TypeError: cannot pickle '_thread._local' object During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/wei.liu/oss-spark/python/pyspark/sql/connect/streaming/readwriter.py", line 508, in foreachBatch self._write_proto.foreach_batch.python_function.command = CloudPickleSerializer().dumps( File "/home/wei.liu/oss-spark/python/pyspark/serializers.py", line 469, in dumps raise pickle.PicklingError(msg) _pickle.PicklingError: Could not serialize object: TypeError: cannot pickle '_thread._local' object ``` Now it is replaced with: ``` pyspark.errors.exceptions.base.PySparkPicklingError: [STREAMING_CONNECT_SERIALIZATION_ERROR] Cannot serialize the function `foreachBatch`. If you accessed the spark session, or a dataframe defined outside of the function, please be aware that they are not allowed in Spark Connect. For foreachBatch, please access the spark session using `df.sparkSession`, where `df` is the first parameter in your foreachBatch function. For StreamingQueryListener, please access the spark session using `self.spark` ``` ### How was this patch tested? Add unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42594 from WweiL/SPARK-44839-spark-session-error. Lead-authored-by: Wei Liu <wei.liu@databricks.com> Co-authored-by: Wei Liu <z920631580@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information
1 parent
bf3bef1
commit 0b3a582
Showing
10 changed files
with
130 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters