Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Jun 11, 2021

What changes were proposed in this pull request?

Instantiate a new Hive client through Hive.getWithoutRegisterFns(conf, false) instead of Hive.get(conf), if Hive version is >= '2.3.9' (the built-in version).

Why are the changes needed?

HIVE-10319 introduced a new API get_all_functions which is only supported in Hive 1.3.0/2.0.0 and up. As result, when Spark 3.x talks to a HMS service of version 1.2 or lower, the following error will occur:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3897)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
        ... 96 more
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3845)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3833)

The get_all_functions is called only when doRegisterAllFns is set to true:

  private Hive(HiveConf c, boolean doRegisterAllFns) throws HiveException {
    conf = c;
    if (doRegisterAllFns) {
      registerAllFunctionsOnce();
    }
  }

what this does is to register all Hive permanent functions defined in HMS in Hive's FunctionRegistry class, via iterating through results from get_all_functions. To Spark, this seems unnecessary as it loads Hive permanent (not built-in) UDF via directly calling the HMS API, i.e., get_function. The FunctionRegistry is only used in loading Hive's built-in function that is not supported by Spark. At this time, it only applies to histogram_numeric.

HIVE-21563 introduced a new API getWithoutRegisterFns which skips the above registration and is available in Hive 2.3.9. Therefore, Spark should adopt it to avoid the cost.

Does this PR introduce any user-facing change?

Yes with this fix Spark now should be able to talk to HMS server with Hive 1.2.x and lower.

How was this patch tested?

Manually started a HMS server of Hive version 1.2.2. Without the PR it failed with the above exception. With the PR the error disappeared and I can successfully perform common operations such as create table, create database, list tables, etc.

@github-actions github-actions bot added the SQL label Jun 11, 2021
@sunchao
Copy link
Member Author

sunchao commented Jun 11, 2021

cc @dongjoon-hyun @viirya picking this up again after Hive 2.3.9 upgrade.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. (Pending CIs)

@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Test build #139710 has finished for PR 32887 at commit 79bbc2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum closed this in 9c7250f Jun 12, 2021
@sunchao
Copy link
Member Author

sunchao commented Jun 12, 2021

Thanks @wangyum @viirya and @dongjoon-hyun !

@wangyum
Copy link
Member

wangyum commented Jun 12, 2021

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants