Skip to content

Commit

Permalink
Revert "[SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `sp…
Browse files Browse the repository at this point in the history
…ark-proto` uber jar to test the `connect` module"

### What changes were proposed in this pull request?
This reverts commit df63adf.

### Why are the changes needed?
As [reported](apache#42236 (comment)) by MaxGekk , the solution for apache#42236 is not perfect, and it breaks the usability of importing Spark as a Maven project into idea. On the other hand, if `mvn clean test` is executed, test failures will also occur like
```
[ERROR] [Error] /tmp/spark-3.5.0/connector/connect/server/target/generated-test-sources/protobuf/java/org/apache/spark/sql/protobuf/protos/TestProto.java:9:46:  error: package org.sparkproject.spark_protobuf.protobuf does not exist
```
Therefore, this pr will revert the change of SPARK-43646, and `from_protobuf messageClassName` and `from_protobuf messageClassName options` in `PlanGenerationTestSuite` will be ignored in a follow-up.

At present, it is difficult to make the maven testing of the `spark-protobuf` function in the `connect` module as good as possible.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#42746 from LuciferYang/Revert-SPARK-43646.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
  • Loading branch information
LuciferYang committed Aug 31, 2023
1 parent 9d84369 commit 723a0aa
Show file tree
Hide file tree
Showing 10 changed files with 8 additions and 175 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3236,15 +3236,14 @@ class PlanGenerationTestSuite
"connect/common/src/test/resources/protobuf-tests/common.desc"

test("from_protobuf messageClassName") {
binary.select(
pbFn.from_protobuf(fn.col("bytes"), "org.apache.spark.sql.protobuf.protos.TestProtoObj"))
binary.select(pbFn.from_protobuf(fn.col("bytes"), classOf[StorageLevel].getName))
}

test("from_protobuf messageClassName options") {
binary.select(
pbFn.from_protobuf(
fn.col("bytes"),
"org.apache.spark.sql.protobuf.protos.TestProtoObj",
classOf[StorageLevel].getName,
Map("recursive.fields.max.depth" -> "2").asJava))
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Project [from_protobuf(bytes#0, org.apache.spark.sql.protobuf.protos.TestProtoObj, None) AS from_protobuf(bytes)#0]
Project [from_protobuf(bytes#0, org.apache.spark.connect.proto.StorageLevel, None) AS from_protobuf(bytes)#0]
+- LocalRelation <empty>, [id#0L, bytes#0]
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Project [from_protobuf(bytes#0, org.apache.spark.sql.protobuf.protos.TestProtoObj, None, (recursive.fields.max.depth,2)) AS from_protobuf(bytes)#0]
Project [from_protobuf(bytes#0, org.apache.spark.connect.proto.StorageLevel, None, (recursive.fields.max.depth,2)) AS from_protobuf(bytes)#0]
+- LocalRelation <empty>, [id#0L, bytes#0]
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
}
}, {
"literal": {
"string": "org.apache.spark.sql.protobuf.protos.TestProtoObj"
"string": "org.apache.spark.connect.proto.StorageLevel"
}
}]
}
Expand Down
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
}
}, {
"literal": {
"string": "org.apache.spark.sql.protobuf.protos.TestProtoObj"
"string": "org.apache.spark.connect.proto.StorageLevel"
}
}, {
"unresolvedFunction": {
Expand Down
Binary file not shown.
88 changes: 0 additions & 88 deletions connector/connect/server/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -248,13 +248,6 @@
</dependency>
</dependencies>
<build>
<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.6.2</version>
</extension>
</extensions>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
Expand Down Expand Up @@ -410,87 +403,6 @@
</transformers>
</configuration>
</plugin>
<!--
SPARK-43646: In order for `ProtoToParsedPlanTestSuite` to successfully test `from_protobuf_messageClassName`
and `from_protobuf_messageClassName_options`, `maven-antrun-plugin` is used to replace all
`com.google.protobuf.` with `org.sparkproject.spark_protobuf.protobuf.` in the Java files
generated by `protobuf-maven-plugin`.
-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<phase>process-test-sources</phase>
<configuration>
<target>
<replace dir="${project.basedir}/target/generated-test-sources"
includes="**/*.java"
token="com.google.protobuf."
value="org.sparkproject.spark_protobuf.protobuf."/>
</target>
</configuration>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<profiles>
<profile>
<id>default-protoc</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<!-- Add protobuf-maven-plugin and provide ScalaPB as a code generation plugin -->
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>com.google.protobuf:protoc:${protobuf.version}:exe:${os.detected.classifier}</protocArtifact>
<protoTestSourceRoot>src/test/protobuf</protoTestSourceRoot>
</configuration>
<executions>
<execution>
<goals>
<goal>test-compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
<profile>
<id>user-defined-protoc</id>
<properties>
<spark.protoc.executable.path>${env.SPARK_PROTOC_EXEC_PATH}</spark.protoc.executable.path>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocExecutable>${spark.protoc.executable.path}</protocExecutable>
<protoTestSourceRoot>src/test/protobuf</protoTestSourceRoot>
</configuration>
<executions>
<execution>
<goals>
<goal>test-compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
27 changes: 0 additions & 27 deletions connector/connect/server/src/test/protobuf/test.proto

This file was deleted.

55 changes: 2 additions & 53 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

import java.io._
import java.nio.charset.StandardCharsets.UTF_8
import java.nio.file.{Files, StandardOpenOption}
import java.nio.file.Files
import java.util.Locale

import scala.io.Source
Expand Down Expand Up @@ -765,13 +765,7 @@ object SparkConnectCommon {
object SparkConnect {
import BuildCommons.protoVersion

val rewriteJavaFile = TaskKey[Unit]("rewriteJavaFile",
"Rewrite the generated Java PB files.")
val genPBAndRewriteJavaFile = TaskKey[Unit]("genPBAndRewriteJavaFile",
"Generate Java PB files and overwrite their contents.")

lazy val settings = Seq(
PB.protocVersion := BuildCommons.protoVersion,
// For some reason the resolution from the imported Maven build does not work for some
// of these dependendencies that we need to shade later on.
libraryDependencies ++= {
Expand Down Expand Up @@ -802,42 +796,6 @@ object SparkConnect {
)
},

// SPARK-43646: The following 3 statements are used to make the connect module use the
// Spark-proto assembly jar when compiling and testing using SBT, which can be keep same
// behavior of Maven testing.
(Test / unmanagedJars) += (LocalProject("protobuf") / assembly).value,
(Test / fullClasspath) :=
(Test / fullClasspath).value.filterNot { f => f.toString.contains("spark-protobuf") },
(Test / fullClasspath) += (LocalProject("protobuf") / assembly).value,

(Test / PB.protoSources) += (Test / sourceDirectory).value / "resources" / "protobuf",

(Test / PB.targets) := Seq(
PB.gens.java -> target.value / "generated-test-sources",
),

// SPARK-43646: Create a custom task to replace all `com.google.protobuf.` with
// `org.sparkproject.spark_protobuf.protobuf.` in the generated Java PB files.
// This is to generate Java files that can be used to test spark-protobuf functions
// in `ProtoToParsedPlanTestSuite`.
rewriteJavaFile := {
val protobufDir = target.value / "generated-test-sources"/"org"/"apache"/"spark"/"sql"/"protobuf"/"protos"
protobufDir.listFiles().foreach { f =>
if (f.getName.endsWith(".java")) {
val contents = Files.readAllLines(f.toPath, UTF_8)
val replaced = contents.asScala.map { line =>
line.replaceAll("com.google.protobuf.", "org.sparkproject.spark_protobuf.protobuf.")
}
Files.write(f.toPath, replaced.asJava, StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE)
}
}
},
// SPARK-43646: `genPBAndRewriteJavaFile` is used to specify the execution order of `PB.generate`
// and `rewriteJavaFile`, and makes `Test / compile` dependent on `genPBAndRewriteJavaFile`
// being executed first.
genPBAndRewriteJavaFile := Def.sequential(Test / PB.generate, rewriteJavaFile).value,
(Test / compile) := (Test / compile).dependsOn(genPBAndRewriteJavaFile).value,

(assembly / test) := { },

(assembly / logLevel) := Level.Info,
Expand Down Expand Up @@ -883,16 +841,7 @@ object SparkConnect {
case m if m.toLowerCase(Locale.ROOT).endsWith(".proto") => MergeStrategy.discard
case _ => MergeStrategy.first
}
) ++ {
val sparkProtocExecPath = sys.props.get("spark.protoc.executable.path")
if (sparkProtocExecPath.isDefined) {
Seq(
PB.protocExecutable := file(sparkProtocExecPath.get)
)
} else {
Seq.empty
}
}
)
}

object SparkConnectClient {
Expand Down

0 comments on commit 723a0aa

Please sign in to comment.