[SPARK-49534][CORE][3.5] No longer prepend sql/hive
and sql/hive-thriftserver
when spark-hive_xxx.jar
is not in the classpath
#48046
+41
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This pr adds two new check condition sto the
launcher.AbstractCommandBuilder#buildClassPath
method:When
SPARK_PREPEND_CLASSES
is true, it no longer prepending the class path of thesql/hive
module whenspark-hive_xxx.jar
is not in the classpath. The assumption here is that ifspark-hive_xxx.jar
is not in the classpath, then the-Phive
profile was not used during package, and therefore the Hive-related jars(such as hive-exec-xx.jar) should also not be in the classpath. To avoid failure in loading the SPI inDataSourceRegister
undersql/hive
, so no longer prependsql/hive
.Meanwhile, due to the strong dependency of
sql/hive-thriftserver
onsql/hive
, the prepend forsql/hive-thriftserver
will also be excluded ifspark-hive_xxx.jar
is not in the classpath. On the other hand, ifspark-hive-thriftserver_xxx.jar
is not in the classpath, then the-Phive-thriftserver
profile was not used during package, and therefore, jars such as hive-cli and hive-beeline should also not be included in the classpath. To avoid the inelegant startup failures of tools such as spark-sql, in this scenario,sql/hive-thriftserver
will no longer be prepended to the classpath.Why are the changes needed?
To fix some bad cases during development, one of them is as follows:
The aforementioned error is due to the fact that after #40848, the initialization of the SPI
org.apache.spark.sql.hive.execution.HiveFileFormat
within thesql/hive
module requiresorg.apache.hadoop.hive.ql.plan.FileSinkDesc
, but in the current scenario, the relevant jars are not present in the classpath. Therefore, the current pr opts to not prepend the classpath ofsql/hive
in this specific scenario.Another one is as follows:
The aforementioned failure occurred because, when compiling without the
-Phive
and-Phive-thriftserver
profiles, the classpath lacked the necessary dependencies related to hive-cli. Therefore, in this scenario,sql/hive-thriftserver
should not be prepended to the classpath either.Does this PR introduce any user-facing change?
No,this is only for developers
How was this patch tested?
The first scenario no longer reports errors:
For the second scenario, although spark-sql will also fail to start, the error message appears to be simpler and clearer:
Was this patch authored or co-authored using generative AI tooling?
No