Skip to content

Commit a0d807d

Browse files
dongjoon-hyunHyukjinKwon
authored andcommitted
[SPARK-26856][PYSPARK][FOLLOWUP] Fix UT failure due to wrong patterns for Kinesis assembly
## What changes were proposed in this pull request? After [SPARK-26856](#23797), `Kinesis` Python UT fails with `Found multiple JARs` exception due to a wrong pattern. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/104171/console ``` Exception: Found multiple JARs: .../spark-streaming-kinesis-asl-assembly-3.0.0-SNAPSHOT.jar, .../spark-streaming-kinesis-asl-assembly_2.12-3.0.0-SNAPSHOT.jar; please remove all but one ``` It's because the pattern was changed in a wrong way. **Original** ```python kinesis_asl_assembly_dir, "target/scala-*/%s-*.jar" % name_prefix)) kinesis_asl_assembly_dir, "target/%s_*.jar" % name_prefix)) ``` **After SPARK-26856** ```python project_full_path, "target/scala-*/%s*.jar" % jar_name_prefix)) project_full_path, "target/%s*.jar" % jar_name_prefix)) ``` The actual kinesis assembly jar files look like the followings. **SBT Build** ``` -rw-r--r-- 1 dongjoon staff 87459461 Apr 1 19:01 spark-streaming-kinesis-asl-assembly-3.0.0-SNAPSHOT.jar -rw-r--r-- 1 dongjoon staff 309 Apr 1 18:58 spark-streaming-kinesis-asl-assembly_2.12-3.0.0-SNAPSHOT-tests.jar -rw-r--r-- 1 dongjoon staff 309 Apr 1 18:58 spark-streaming-kinesis-asl-assembly_2.12-3.0.0-SNAPSHOT.jar ``` **MAVEN Build** ``` -rw-r--r-- 1 dongjoon staff 8.6K Apr 1 18:55 spark-streaming-kinesis-asl-assembly_2.12-3.0.0-SNAPSHOT-sources.jar -rw-r--r-- 1 dongjoon staff 8.6K Apr 1 18:55 spark-streaming-kinesis-asl-assembly_2.12-3.0.0-SNAPSHOT-test-sources.jar -rw-r--r-- 1 dongjoon staff 8.7K Apr 1 18:55 spark-streaming-kinesis-asl-assembly_2.12-3.0.0-SNAPSHOT-tests.jar -rw-r--r-- 1 dongjoon staff 21M Apr 1 18:55 spark-streaming-kinesis-asl-assembly_2.12-3.0.0-SNAPSHOT.jar ``` In addition, after SPARK-26856, the utility function `search_jar` is shared to find `avro` jar files which are identical for both `sbt` and `mvn`. To sum up, The current jar pattern parameter cannot handle both `kinesis` and `avro` jars. This PR splits the single pattern into two patterns. ## How was this patch tested? Manual. Please note that this will remove only `Found multiple JARs` exception. Kinesis tests need more configurations to run locally. ``` $ build/sbt -Pkinesis-asl test:package streaming-kinesis-asl-assembly/assembly $ export ENABLE_KINESIS_TESTS=1 $ python/run-tests.py --python-executables python2.7 --module pyspark-streaming ``` Closes #24268 from dongjoon-hyun/SPARK-26856. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 0b150f8 commit a0d807d

File tree

3 files changed

+8
-5
lines changed

3 files changed

+8
-5
lines changed

python/pyspark/sql/avro/functions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ def _test():
100100
import os
101101
import sys
102102
from pyspark.testing.utils import search_jar
103-
avro_jar = search_jar("external/avro", "spark-avro")
103+
avro_jar = search_jar("external/avro", "spark-avro", "spark-avro")
104104
if avro_jar is None:
105105
print(
106106
"Skipping all Avro Python tests as the optional Avro project was "

python/pyspark/testing/streamingutils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@
3434
"was not set.")
3535
else:
3636
kinesis_asl_assembly_jar = search_jar("external/kinesis-asl-assembly",
37-
"spark-streaming-kinesis-asl-assembly")
37+
"spark-streaming-kinesis-asl-assembly-",
38+
"spark-streaming-kinesis-asl-assembly_")
3839
if kinesis_asl_assembly_jar is None:
3940
kinesis_requirement_message = (
4041
"Skipping all Kinesis Python tests as the optional Kinesis project was "

python/pyspark/testing/utils.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,9 @@ def close(self):
103103
pass
104104

105105

106-
def search_jar(project_relative_path, jar_name_prefix):
106+
def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
107+
# Note that 'sbt_jar_name_prefix' and 'mvn_jar_name_prefix' are used since the prefix can
108+
# vary for SBT or Maven specifically. See also SPARK-26856
107109
project_full_path = os.path.join(
108110
os.environ["SPARK_HOME"], project_relative_path)
109111

@@ -113,9 +115,9 @@ def search_jar(project_relative_path, jar_name_prefix):
113115
# Search jar in the project dir using the jar name_prefix for both sbt build and maven
114116
# build because the artifact jars are in different directories.
115117
sbt_build = glob.glob(os.path.join(
116-
project_full_path, "target/scala-*/%s*.jar" % jar_name_prefix))
118+
project_full_path, "target/scala-*/%s*.jar" % sbt_jar_name_prefix))
117119
maven_build = glob.glob(os.path.join(
118-
project_full_path, "target/%s*.jar" % jar_name_prefix))
120+
project_full_path, "target/%s*.jar" % mvn_jar_name_prefix))
119121
jar_paths = sbt_build + maven_build
120122
jars = [jar for jar in jar_paths if not jar.endswith(ignored_jar_suffixes)]
121123

0 commit comments

Comments
 (0)