Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43646][PROTOBUF][BUILD] Split protobuf-assembly module from protobuf module #41466

Closed
wants to merge 12 commits into from

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jun 5, 2023

What changes were proposed in this pull request?

There will be maven test failed of connect server module before this pr:

run

build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/server

there will be two test failed as follows:

- from_protobuf_messageClassName *** FAILED ***
  org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] Could not load Protobuf class with name org.apache.spark.connect.proto.StorageLevel. org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with Protobuf classes needs to be shaded (com.google.protobuf.* --> org.sparkproject.spark_protobuf.protobuf.*).
  at org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417)
  at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193)
  at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42)
  at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194)
  at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)


- from_protobuf_messageClassName_options *** FAILED ***
  org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] Could not load Protobuf class with name org.apache.spark.connect.proto.StorageLevel. org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with Protobuf classes needs to be shaded (com.google.protobuf.* --> org.sparkproject.spark_protobuf.protobuf.*).
  at org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417)
  at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193)
  at org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43)
  at org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42)
  at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194)
  at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 

The reason for the test failure is that the Maven test used shaded spark-protobuf jar, and the Message type StorageLevel used for testing inherited from com.google.protobuf.Message instead of org.sparkproject.spark_protobuf.protobuf.Message.

GitHub Actions can pass due to sbt test always used non-assembly park-protobuf jar, the com.google.protobuf.Message class has not been relocated.

So this pr references the patterns of kafka-0-10/kafka-0-10-assembly and kinesis-asl/kinesis-asl-assembly, splitting spark-protobuf module into spark-protobuf and spark-protobuf-assembly, and make connect server module always use spark-protobuf for testing to maintain the same behavior as sbt testing.

With this pr, the above maven test commands will pass.

Why are the changes needed?

Make connect server module can test pass using maven.

Does this PR introduce any user-facing change?

Yes, user needs to use spark-protobuf-assembly jar instead of the previous spark-protobuf jar.

How was this patch tested?

  • Pass Github Actions
  • Manual check
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/server

all test passed after this pr.

  • Manual check the the contents of the jar, the content of spark-protobuf-assembly and spark-protobuf(without this pr) is the same, and spark-protobuf(with this pr) is no longer shaded

@LuciferYang LuciferYang marked this pull request as draft June 5, 2023 14:19
@LuciferYang LuciferYang changed the title [SPARK-43646][PROTOBUF] Split protobuf-assembly module from protobuf module [SPARK-43646][PROTOBUF][BUILD] Split protobuf-assembly module from protobuf module Jun 5, 2023
@LuciferYang LuciferYang marked this pull request as ready for review June 7, 2023 05:28
@pan3793
Copy link
Member

pan3793 commented Jun 7, 2023

Test w/ vanilla jar but deliver assembly jar w/ relocated classes has potential issues, some reflection may work well w/ vanilla classes but not after class relocation.

Ideally, we should test the assembly jar if possible, to make sure the artifact that the end user consumes works as expected.

But unfortunately, IDEA and Maven have limited support for such cases, not sure about SBT.

@LuciferYang
Copy link
Contributor Author

also cc @rangadi Do you have a better solution to this issue?

@LuciferYang
Copy link
Contributor Author

I have adopted a different approach to solve the issue: #42236, close this PR first.

@LuciferYang LuciferYang closed this Aug 1, 2023
@LuciferYang
Copy link
Contributor Author

re open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants