-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-48763][CONNECT][BUILD] Move connect server and common to builtin module #47157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a61206e
to
fb4e543
Compare
spark/dev/sparktestsupport/modules.py Lines 323 to 334 in 0487d78
This place may need to be corrected. |
326739a
to
366d76a
Compare
qq: what about the |
a4c8b27
to
44f87df
Compare
That stays as a non-builtin library for now because they cannot coexist with Spark Classic jars |
8818ee4
to
9677b9b
Compare
@@ -62,7 +62,6 @@ object CheckConnectJvmClientCompatibility { | |||
private val clientJar = { | |||
val path = Paths.get( | |||
sparkHome, | |||
"connector", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not delete this line
@HyukjinKwon I have a question. If we move the dependencies defined in the |
Yeah we may unpack the shaded dependencies but I was thinking that could be done separately. Or if you meant something else, we could try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the size of this change (2253 files) and the diff of spark-deps-hadoop-3-hive-2.3
, I believe we need to be more careful about this change. Maybe, a discussion thread in dev@spark
and possibly a vote to get a community bless, @HyukjinKwon .
Sure, will do. |
@@ -74,6 +74,11 @@ | |||
<artifactId>spark-repl_${scala.binary.version}</artifactId> | |||
<version>${project.version}</version> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the end users do not have to specify
--packages
when they start the Spark Connect server.
What I mean is, if we are just trying to address this issue, it would be sufficient to just modify this file, and it appears that there is no need to move the code of the connect server module to a new directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will reply in the dev mailing list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe We can add the following exclusions
to spark-connect_${scala.binary.version
, which will prevent any changes to spark-deps-hadoop-3-hive-2.3
, since the current spark-connect
module is still a shaded jar and does not require the addition of extra third-party dependencies to the project:
<exclusions>
<exclusion>
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect-common_${scala.binary.version}</artifactId>
</exclusion>
<exclusion>
<groupId>io.grpc</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>failureaccess</artifactId>
</exclusion>
</exclusions>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: we are already good to unify the Guava version in the whole Spark project, see #42493
91f7f2c
to
5b3b80a
Compare
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-avro_${scala.binary.version}</artifactId> | ||
<version>${project.version}</version> | ||
<scope>provided</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the scope
is configured as provided
, these two jars should not be included in the packaging result.
leaving the connect client in a separate folder looks weird. I suppose there is no contract to leave all optional modules under |
I actually plan to put the connect client at the top level too but it should be together with SPIP so it is there for now. |
3887184
to
5ace07f
Compare
Vote passed at https://lists.apache.org/thread/7pyfy90nbs8ltqyorpgzpl3cp24xrx8s. Thanks all! |
Merged to master. |
…rate ### What changes were proposed in this pull request? The pr is followuping #47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `connect/server``connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="772" alt="image" src="https://github.com/apache/spark/assets/15246973/44b80571-bdb6-40cb-9571-8b34d009b5f8"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47291 from panbingkun/SPARK-48763_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…in module ### What changes were proposed in this pull request? This PR proposes to move the connect server to builtin module. From: ``` connector/connect/server connector/connect/common ``` To: ``` connect/server connect/common ``` ### Why are the changes needed? So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in apache#39928 (comment). ### Does this PR introduce _any_ user-facing change? Yes, users don't have to specify `--packages` anymore. ### How was this patch tested? CI in this PR should verify them. Also manually tested several basic commands such as: - Maven build - SBT build - Running basic Scala client commands ```bash cd connector/connect bin/spark-connect bin/spark-connect-scala-client ``` - Running basic PySpark client commands ```bash bin/pyspark --remote local ``` - Connecting to the server launched by `./sbin/start-connect-server.sh` ```bash ./sbin/start-connect-server.sh bin/pyspark --remote "sc://localhost" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47157 from HyukjinKwon/move-connect-server-builtin. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping apache#47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `connect/server``connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="772" alt="image" src="https://github.com/apache/spark/assets/15246973/44b80571-bdb6-40cb-9571-8b34d009b5f8"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47291 from panbingkun/SPARK-48763_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
I looks like |
Lemme take a look |
Hm, it works to me. Can you share the exact command and full error message? |
oh.. my fault there was something else that did not work as intended on my build system. |
…in module ### What changes were proposed in this pull request? This PR proposes to move the connect server to builtin module. From: ``` connector/connect/server connector/connect/common ``` To: ``` connect/server connect/common ``` ### Why are the changes needed? So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in apache#39928 (comment). ### Does this PR introduce _any_ user-facing change? Yes, users don't have to specify `--packages` anymore. ### How was this patch tested? CI in this PR should verify them. Also manually tested several basic commands such as: - Maven build - SBT build - Running basic Scala client commands ```bash cd connector/connect bin/spark-connect bin/spark-connect-scala-client ``` - Running basic PySpark client commands ```bash bin/pyspark --remote local ``` - Connecting to the server launched by `./sbin/start-connect-server.sh` ```bash ./sbin/start-connect-server.sh bin/pyspark --remote "sc://localhost" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47157 from HyukjinKwon/move-connect-server-builtin. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping apache#47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `connect/server``connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="772" alt="image" src="https://github.com/apache/spark/assets/15246973/44b80571-bdb6-40cb-9571-8b34d009b5f8"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47291 from panbingkun/SPARK-48763_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rver into sql directory ### What changes were proposed in this pull request? This PR is a followup of #47157 that moves `connect` into `sql/connect`. ### Why are the changes needed? The reasons are as follow: - There was a bit of question about moving `connect` as a standalone top level (#47157 (comment)). - Technically all Spark Connect related code have to placed under `sql` just like Hive thrift server. - Spark Connect server is 99% SQL dedicated code for now - Spark Connect server already is using a lot of `spark.sql` configurations, e.g., `spark.sql.connect.serverStacktrace.enabled` - Spark Connect common is only for SQL module. If other components have to be implemented, that common has to be placed within that directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI in this PR should verify it. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47579 from HyukjinKwon/SPARK-48763-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping #47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `sql/connect/server` `sql/connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="1406" alt="image" src="https://github.com/user-attachments/assets/ab681963-37f7-4f48-9458-61f591477365"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47585 from panbingkun/SPARK-48763_FOLLOWUP_2. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…rver into sql directory ### What changes were proposed in this pull request? This PR is a followup of apache#47157 that moves `connect` into `sql/connect`. ### Why are the changes needed? The reasons are as follow: - There was a bit of question about moving `connect` as a standalone top level (apache#47157 (comment)). - Technically all Spark Connect related code have to placed under `sql` just like Hive thrift server. - Spark Connect server is 99% SQL dedicated code for now - Spark Connect server already is using a lot of `spark.sql` configurations, e.g., `spark.sql.connect.serverStacktrace.enabled` - Spark Connect common is only for SQL module. If other components have to be implemented, that common has to be placed within that directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI in this PR should verify it. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47579 from HyukjinKwon/SPARK-48763-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping apache#47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `sql/connect/server` `sql/connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="1406" alt="image" src="https://github.com/user-attachments/assets/ab681963-37f7-4f48-9458-61f591477365"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47585 from panbingkun/SPARK-48763_FOLLOWUP_2. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…in module ### What changes were proposed in this pull request? This PR proposes to move the connect server to builtin module. From: ``` connector/connect/server connector/connect/common ``` To: ``` connect/server connect/common ``` ### Why are the changes needed? So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in apache#39928 (comment). ### Does this PR introduce _any_ user-facing change? Yes, users don't have to specify `--packages` anymore. ### How was this patch tested? CI in this PR should verify them. Also manually tested several basic commands such as: - Maven build - SBT build - Running basic Scala client commands ```bash cd connector/connect bin/spark-connect bin/spark-connect-scala-client ``` - Running basic PySpark client commands ```bash bin/pyspark --remote local ``` - Connecting to the server launched by `./sbin/start-connect-server.sh` ```bash ./sbin/start-connect-server.sh bin/pyspark --remote "sc://localhost" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47157 from HyukjinKwon/move-connect-server-builtin. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping apache#47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `connect/server``connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="772" alt="image" src="https://github.com/apache/spark/assets/15246973/44b80571-bdb6-40cb-9571-8b34d009b5f8"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47291 from panbingkun/SPARK-48763_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rver into sql directory ### What changes were proposed in this pull request? This PR is a followup of apache#47157 that moves `connect` into `sql/connect`. ### Why are the changes needed? The reasons are as follow: - There was a bit of question about moving `connect` as a standalone top level (apache#47157 (comment)). - Technically all Spark Connect related code have to placed under `sql` just like Hive thrift server. - Spark Connect server is 99% SQL dedicated code for now - Spark Connect server already is using a lot of `spark.sql` configurations, e.g., `spark.sql.connect.serverStacktrace.enabled` - Spark Connect common is only for SQL module. If other components have to be implemented, that common has to be placed within that directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI in this PR should verify it. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47579 from HyukjinKwon/SPARK-48763-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping apache#47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `sql/connect/server` `sql/connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="1406" alt="image" src="https://github.com/user-attachments/assets/ab681963-37f7-4f48-9458-61f591477365"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47585 from panbingkun/SPARK-48763_FOLLOWUP_2. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…in module ### What changes were proposed in this pull request? This PR proposes to move the connect server to builtin module. From: ``` connector/connect/server connector/connect/common ``` To: ``` connect/server connect/common ``` ### Why are the changes needed? So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in apache#39928 (comment). ### Does this PR introduce _any_ user-facing change? Yes, users don't have to specify `--packages` anymore. ### How was this patch tested? CI in this PR should verify them. Also manually tested several basic commands such as: - Maven build - SBT build - Running basic Scala client commands ```bash cd connector/connect bin/spark-connect bin/spark-connect-scala-client ``` - Running basic PySpark client commands ```bash bin/pyspark --remote local ``` - Connecting to the server launched by `./sbin/start-connect-server.sh` ```bash ./sbin/start-connect-server.sh bin/pyspark --remote "sc://localhost" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47157 from HyukjinKwon/move-connect-server-builtin. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping apache#47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `connect/server``connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="772" alt="image" src="https://github.com/apache/spark/assets/15246973/44b80571-bdb6-40cb-9571-8b34d009b5f8"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47291 from panbingkun/SPARK-48763_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rver into sql directory ### What changes were proposed in this pull request? This PR is a followup of apache#47157 that moves `connect` into `sql/connect`. ### Why are the changes needed? The reasons are as follow: - There was a bit of question about moving `connect` as a standalone top level (apache#47157 (comment)). - Technically all Spark Connect related code have to placed under `sql` just like Hive thrift server. - Spark Connect server is 99% SQL dedicated code for now - Spark Connect server already is using a lot of `spark.sql` configurations, e.g., `spark.sql.connect.serverStacktrace.enabled` - Spark Connect common is only for SQL module. If other components have to be implemented, that common has to be placed within that directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI in this PR should verify it. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47579 from HyukjinKwon/SPARK-48763-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…rate ### What changes were proposed in this pull request? The pr is followuping apache#47157, to make `dev/lint-scala` error message more accurate. ### Why are the changes needed? After move from: `connector/connect/server` `connector/connect/common` to: `sql/connect/server` `sql/connect/common` Our error message in `dev/lint-scala` should be updated synchronously. eg: <img width="709" alt="image" src="https://github.com/apache/spark/assets/15246973/d749e371-7621-4063-b512-279d0690d573"> <img width="1406" alt="image" src="https://github.com/user-attachments/assets/ab681963-37f7-4f48-9458-61f591477365"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47585 from panbingkun/SPARK-48763_FOLLOWUP_2. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This PR proposes to move the connect server to builtin module.
From:
To:
Why are the changes needed?
So the end users do not have to specify
--packages
when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in #39928 (comment).Does this PR introduce any user-facing change?
Yes, users don't have to specify
--packages
anymore.How was this patch tested?
CI in this PR should verify them.
Also manually tested several basic commands such as:
Maven build
SBT build
Running basic Scala client commands
cd connector/connect bin/spark-connect bin/spark-connect-scala-client
Running basic PySpark client commands
bin/pyspark --remote local
Connecting to the server launched by
./sbin/start-connect-server.sh
./sbin/start-connect-server.sh bin/pyspark --remote "sc://localhost"
Was this patch authored or co-authored using generative AI tooling?
No.