-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use spark-proto
uber jar to test the connect
module
#42236
Conversation
spark-proto uber
jar to test the connect module
spark-proto uber
jar to test the connect modulespark-proto uber
jar to test the connect
module
spark-proto uber
jar to test the connect
modulespark-proto uber
jar to test the connet
module
spark-proto uber
jar to test the connet
modulespark-proto uber
jar to test the connet
module
spark-proto uber
jar to test the connet
modulespark-proto uber
jar to test the connet
module
spark-proto uber
jar to test the connet
modulespark-proto uber
jar to test the connect
module
spark-proto uber
jar to test the connect
modulespark-proto
uber jar to test the connect
module
@@ -3233,14 +3233,15 @@ class PlanGenerationTestSuite | |||
"connect/common/src/test/resources/protobuf-tests/common.desc" | |||
|
|||
test("from_protobuf messageClassName") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we deem this plan too intricate, we could delete or ignore test cases from_protobuf messageClassName
and from_protobuf messageClassName options
to temporarily abandon this testing scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thank you.
* * <p> | ||
* * TODO(SPARK-44606): Generate this file and replace the package names in the file when testing. | ||
*/ | ||
public final class TestProto { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task of generating Java files during testing and replacing all instances of com.google.protobuf.
with org.sparkproject.spark_protobuf.protobuf.
in the file will be carried forward as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the doc comment to say why this is included (the current comment talks about a fix, but not the problem). Looks like this uses shaded version of protobuf library.
I like this but cc @grundprinzip and @zhenlineo too |
Would it be easier if we change maven to use the unshaded jar? |
spark/connector/protobuf/pom.xml Lines 95 to 100 in 5c36c58
Since |
While I'm not certain if it's reasonable, I still want to point out that relocating the content of the |
What suggestions does everyone have on how to proceed with fixing this issue? Should we have it fixed before the next 3.5.0 RC release? |
What are the standing comments? Otherwise, I would guess its good to go? |
@HyukjinKwon Hmm... I think the fix should be fine. However, as Spark 3.5 no longer needs to support protobuf 2.5, I hope to make some changes to the The related viewpoints include: #42236 (comment), Of course, we could also choose to fix this issue for Spark 3.5 using this pr first and have more discussions or changes for this in Spark 4.0. |
4ba45e0 has automated the process of generating |
@HyukjinKwon @rangadi Since there are new code changes in 4ba45e0, could you please take another look? If there are no new issues, I will merge this pr if GA passes. |
GA passed and I have manually verified with Maven, the tests can pass. |
@LuciferYang BTW, is this a real blocker? or test-only issue? |
hmm... If we accept that Maven tests inevitably fail, then this is not a blocker |
@HyukjinKwon So I don't think this is a real blocker, but we should at least make the maven test pass in master/branch-3.5, even if it means removing these two cases. |
I will merge this pr today If there are no more comments @rangadi @HyukjinKwon @hvanhovell @zhengruifeng @rangadi @pan3793 @grundprinzip @zhenlineo |
…o` uber jar to test the `connect` module ### What changes were proposed in this pull request? Before this pr, when we tested the `connect` module, Maven used the shaded `spark-protobuf` jar for testing, while SBT used the original jar for testing, which also led to inconsistent testing behavior. So some tests passed when using SBT, but failed when using Maven: run ``` build/mvn clean install -DskipTests build/mvn test -pl connector/connect/server ``` there will be two test failed as follows: ``` - from_protobuf_messageClassName *** FAILED *** org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] Could not load Protobuf class with name org.apache.spark.connect.proto.StorageLevel. org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with Protobuf classes needs to be shaded (com.google.protobuf.* --> org.sparkproject.spark_protobuf.protobuf.*). at org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417) at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193) at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42) at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194) at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) - from_protobuf_messageClassName_options *** FAILED *** org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] Could not load Protobuf class with name org.apache.spark.connect.proto.StorageLevel. org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with Protobuf classes needs to be shaded (com.google.protobuf.* --> org.sparkproject.spark_protobuf.protobuf.*). at org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417) at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193) at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43) at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42) at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194) at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) ``` So this pr make SBT also use `spark-proto` uber jar(`spark-protobuf-assembly-**-SNAPSHOT.jar`) for the above tests and refactor the test cases to make them pass both SBT and Maven after this pr. ### Why are the changes needed? Make connect server module can test pass using both SBT and maven. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Actions - Manual check ``` build/mvn clean install -DskipTests build/mvn test -pl connector/connect/server ``` all test passed after this pr. Closes #42236 from LuciferYang/protobuf-test. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit df63adf) Signed-off-by: yangjie01 <yangjie01@baidu.com>
Merged into master and branch-3.5 to fix the maven test for the connect-server module |
Thanks all |
@MaxGekk I thought for a while, I have no better way to solve this issue, indeed, it's impossible to be perfect. I have decided to revert this pr and ignore the corresponding two test cases. |
…ark-proto` uber jar to test the `connect` module" ### What changes were proposed in this pull request? This reverts commit df63adf. ### Why are the changes needed? As [reported](#42236 (comment)) by MaxGekk , the solution for #42236 is not perfect, and it breaks the usability of importing Spark as a Maven project into idea. On the other hand, if `mvn clean test` is executed, test failures will also occur like ``` [ERROR] [Error] /tmp/spark-3.5.0/connector/connect/server/target/generated-test-sources/protobuf/java/org/apache/spark/sql/protobuf/protos/TestProto.java:9:46: error: package org.sparkproject.spark_protobuf.protobuf does not exist ``` Therefore, this pr will revert the change of SPARK-43646, and `from_protobuf messageClassName` and `from_protobuf messageClassName options` in `PlanGenerationTestSuite` will be ignored in a follow-up. At present, it is difficult to make the maven testing of the `spark-protobuf` function in the `connect` module as good as possible. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #42746 from LuciferYang/Revert-SPARK-43646. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 723a0aa) Signed-off-by: yangjie01 <yangjie01@baidu.com>
…ark-proto` uber jar to test the `connect` module" ### What changes were proposed in this pull request? This reverts commit df63adf. ### Why are the changes needed? As [reported](#42236 (comment)) by MaxGekk , the solution for #42236 is not perfect, and it breaks the usability of importing Spark as a Maven project into idea. On the other hand, if `mvn clean test` is executed, test failures will also occur like ``` [ERROR] [Error] /tmp/spark-3.5.0/connector/connect/server/target/generated-test-sources/protobuf/java/org/apache/spark/sql/protobuf/protos/TestProto.java:9:46: error: package org.sparkproject.spark_protobuf.protobuf does not exist ``` Therefore, this pr will revert the change of SPARK-43646, and `from_protobuf messageClassName` and `from_protobuf messageClassName options` in `PlanGenerationTestSuite` will be ignored in a follow-up. At present, it is difficult to make the maven testing of the `spark-protobuf` function in the `connect` module as good as possible. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #42746 from LuciferYang/Revert-SPARK-43646. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
…from_protobuf messageClassName options` in `PlanGenerationTestSuite` ### What changes were proposed in this pull request? This pr aims ignore `from_protobuf messageClassName` and `from_protobuf messageClassName options` in `PlanGenerationTestSuite` and remove the related golden files, after this change `from_protobuf_messageClassName` and `from_protobuf_messageClassName_options` in `ProtoToParsedPlanTestSuite` be ignored too. ### Why are the changes needed? SPARK-43646 | (#42236) makes both Maven and SBT use the shaded `spark-protobuf` module when testing the connect module, this allows `mvn clean install` and `mvn package test` to successfully pass tests. But if `mvn clean test` is executed directly, an error `package org.sparkproject.spark_protobuf.protobuf does not exist` will occur. This is because `mvn clean test` directly uses the classes file of the `spark-protobuf` module for testing, without the 'package', hence it does not `shade` and `relocate` protobuf. On the other hand, the change of SPARK-43646 breaks the usability of importing Spark as a Maven project into IDEA(#42236 (comment)). So #42746 revert the change of [SPARK-43646](https://issues.apache.org/jira/browse/SPARK-43646). It's difficult to find a perfect solution to solve this maven test issues now, as in certain scenarios tests would use the `shaded spark-protobuf jar`, like `mvn package test`, while in some other scenarios it will use the `unshaded classes directory`, such as `mvn clean test`. so this pr ignores the relevant tests first and leaves a TODO(SPARK-45030), to re-enable these tests when we find a better solution. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #42751 from LuciferYang/SPARK-45029. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
…from_protobuf messageClassName options` in `PlanGenerationTestSuite` ### What changes were proposed in this pull request? This pr aims ignore `from_protobuf messageClassName` and `from_protobuf messageClassName options` in `PlanGenerationTestSuite` and remove the related golden files, after this change `from_protobuf_messageClassName` and `from_protobuf_messageClassName_options` in `ProtoToParsedPlanTestSuite` be ignored too. ### Why are the changes needed? SPARK-43646 | (#42236) makes both Maven and SBT use the shaded `spark-protobuf` module when testing the connect module, this allows `mvn clean install` and `mvn package test` to successfully pass tests. But if `mvn clean test` is executed directly, an error `package org.sparkproject.spark_protobuf.protobuf does not exist` will occur. This is because `mvn clean test` directly uses the classes file of the `spark-protobuf` module for testing, without the 'package', hence it does not `shade` and `relocate` protobuf. On the other hand, the change of SPARK-43646 breaks the usability of importing Spark as a Maven project into IDEA(#42236 (comment)). So #42746 revert the change of [SPARK-43646](https://issues.apache.org/jira/browse/SPARK-43646). It's difficult to find a perfect solution to solve this maven test issues now, as in certain scenarios tests would use the `shaded spark-protobuf jar`, like `mvn package test`, while in some other scenarios it will use the `unshaded classes directory`, such as `mvn clean test`. so this pr ignores the relevant tests first and leaves a TODO(SPARK-45030), to re-enable these tests when we find a better solution. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #42751 from LuciferYang/SPARK-45029. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 10f3190) Signed-off-by: yangjie01 <yangjie01@baidu.com>
What changes were proposed in this pull request?
Before this pr, when we tested the
connect
module, Maven used the shadedspark-protobuf
jar for testing, while SBT used the original jar for testing, which also led to inconsistent testing behavior. So some tests passed when using SBT, but failed when using Maven:run
there will be two test failed as follows:
So this pr make SBT also use
spark-proto
uber jar(spark-protobuf-assembly-**-SNAPSHOT.jar
) for the above tests and refactor the test cases to make them pass both SBT and Maven after this pr.Why are the changes needed?
Make connect server module can test pass using both SBT and maven.
Does this PR introduce any user-facing change?
No
How was this patch tested?
all test passed after this pr.