[SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client #32887
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Instantiate a new Hive client through
Hive.getWithoutRegisterFns(conf, false)instead ofHive.get(conf), ifHiveversion is >= '2.3.9' (the built-in version).Why are the changes needed?
HIVE-10319 introduced a new API
get_all_functionswhich is only supported in Hive 1.3.0/2.0.0 and up. As result, when Spark 3.x talks to a HMS service of version 1.2 or lower, the following error will occur:The
get_all_functionsis called only whendoRegisterAllFnsis set to true:what this does is to register all Hive permanent functions defined in HMS in Hive's
FunctionRegistryclass, via iterating through results fromget_all_functions. To Spark, this seems unnecessary as it loads Hive permanent (not built-in) UDF via directly calling the HMS API, i.e.,get_function. TheFunctionRegistryis only used in loading Hive's built-in function that is not supported by Spark. At this time, it only applies tohistogram_numeric.HIVE-21563 introduced a new API
getWithoutRegisterFnswhich skips the above registration and is available in Hive 2.3.9. Therefore, Spark should adopt it to avoid the cost.Does this PR introduce any user-facing change?
Yes with this fix Spark now should be able to talk to HMS server with Hive 1.2.x and lower.
How was this patch tested?
Manually started a HMS server of Hive version 1.2.2. Without the PR it failed with the above exception. With the PR the error disappeared and I can successfully perform common operations such as create table, create database, list tables, etc.