Is it possible to support Spark Connect? #964

Enochack · 2024-01-31T03:27:34Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As Spark Connect was introduced since Spark 3.4, it could also be a chance for query engines like Ballista. If Ballista could support the Spark Connect protocol, it will be much more smooth for Spark users to migrate their codebase. I know it's a very chanlleging task to complete. Just brainstorm.

Describe the solution you'd like
The Spark Connect client passes an unresolved logical plan to the server. So we will need to transform the logical plan into an optimized Ballista/Datafusion plan.

Describe alternatives you've considered

Additional context

andygrove · 2024-02-08T15:01:16Z

Yes, it could be supported, at least to some degree. I looked at the docs, and it looks like Spark Connect is essentially a gRPC service that accepts protobuf describing Spark logical plans and then executes the plans and sends results back.

This seems achievable for anything that Spark can represent in a logical plan that Ballista also supports. I haven't read enough of the documentation to understand how UDFs would work.

milenkovicm · 2024-09-13T14:06:21Z

I have implemented part of the spark-connect protocol on top of datafusion, the gap is not too big, at least for batch part.

It is quite cool running pyspark scripts on top of datafusion without their knowledge :) One of the benefits, is that spark-connect supports serialising UDFs (I've tried with pyarrow udfs) which currently ballista protocol does not support.

from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://localhost:15005").appName("Connect Application Demo").getOrCreate()
print(spark.version)

prints

41.0.0 (Datafusion)

apart from fancy print statement backend process takes 30MB instead of 300MB when running single driver

Enochack added the enhancement New feature or request label Jan 31, 2024

Enochack changed the title ~~Is is possible to support Spark Connect?~~ Is it possible to support Spark Connect? Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to support Spark Connect? #964

Is it possible to support Spark Connect? #964

Enochack commented Jan 31, 2024 •

edited

Loading

andygrove commented Feb 8, 2024

milenkovicm commented Sep 13, 2024

Is it possible to support Spark Connect? #964

Is it possible to support Spark Connect? #964

Comments

Enochack commented Jan 31, 2024 • edited Loading

andygrove commented Feb 8, 2024

milenkovicm commented Sep 13, 2024

Enochack commented Jan 31, 2024 •

edited

Loading