Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]Support Alibaba DLF metastore for hive external table #6403

Merged
merged 3 commits into from
May 30, 2022

Conversation

mxdzs0612
Copy link
Contributor

@mxdzs0612 mxdzs0612 commented May 23, 2022

What type of PR is this:

  • bug
  • feature
  • enhancement
  • refactor
  • others

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

Data Lake Formation (DLF) is a key component of the cloud-native data lake framework which is widely used on Alibaba Cloud just like AWS Glue, details see https://www.alibabacloud.com/en/product/datalake-formation. This PR allows users to use DLF as metastore in Hive external table.

Usage:

CREATE EXTERNAL CATALOG dlf_hive_catalog
properties
(
    "type" = "hive",
    "hive.metastore.type" = "DLF"
);

Add a config file hive-site.xml to {FE Home DIR}/conf , with following configs:

<?xml version="1.0"?>
<configuration>
    <!--Set to use dlf client-->
    <property>
        <name>hive.metastore.type</name>
        <value>dlf</value>
    </property>
    <!--DLF endpoint, see https://www.alibabacloud.com/help/en/doc-detail/197608.html-->
    <property>
        <name>dlf.catalog.endpoint</name>
        <value>dlf-vpc.cn-xxx.aliyuncs.com</value>
    </property>
    <!--DLF region, see https://www.alibabacloud.com/help/en/doc-detail/197608.html-->
    <property>
        <name>dlf.catalog.region</name>
        <value>cn-beijing</value>
    </property>
    <!--Proxy mode of DLF-->
    <property>
        <name>dlf.catalog.proxyMode</name>
        <value>DLF_ONLY</value>
    </property>
    <!--Access Key mode of DLF-->
    <property>
        <name>dlf.catalog.akMode</name>
        <value>MANUAL</value>
    </property>
    <!--User id of the alibaba cloud account-->
    <property>
        <name>dlf.catalog.uid</name>
        <value>xxxxxx</value>
    </property>
    <!--Access Key ID of DLF, can be omitted if the cluster is created with the same Alibaba Cloud account of DLF-->
    <property>
        <name>dlf.catalog.accessKeyId</name>
        <value>xxxxxx</value>
    </property>
    <!--Access Key secret of DLF, can be omitted if the cluster is created with the same Alibaba Cloud account of DLF-->
    <property>
        <name>dlf.catalog.accessKeySecret</name>
        <value>xxxxxx</value>
    </property>
</configuration>

@mxdzs0612 mxdzs0612 marked this pull request as ready for review May 23, 2022 07:40
@mxdzs0612 mxdzs0612 changed the title Support Alibaba DLF for hive external table. Support Alibaba DLF metastore for hive external table. May 23, 2022
@mxdzs0612 mxdzs0612 changed the title Support Alibaba DLF metastore for hive external table. Support Alibaba DLF metastore for hive external table May 23, 2022
@@ -465,10 +465,30 @@ under the License.

<!-- https://mvnrepository.com/artifact/com.facebook.presto.hive/hive-apache -->
<dependency>
<groupId>com.facebook.presto.hive</groupId>
<groupId>io.trino.hive</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the regression test.

@@ -100,6 +101,7 @@ public class HiveMetaClient {

public HiveMetaClient(String uris) throws DdlException {
HiveConf conf = new HiveConf();
conf.addResource(new Path("file:///" + StarRocksFE.STARROCKS_HOME_DIR + "/conf/hive-site.xml"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$FE/conf will be used as the classpath of fe. I think it is no necessary to add this?

@@ -119,8 +121,13 @@ public class AutoCloseClient implements AutoCloseable {
private final IMetaStoreClient hiveClient;

private AutoCloseClient(HiveConf conf) throws MetaException {
hiveClient = RetryingMetaStoreClient.getProxy(conf, dummyHookLoader,
HiveMetaStoreThriftClient.class.getName());
if ("dlf".equalsIgnoreCase(conf.get("hive.metastore"))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the 'hive.metastore' configuration defined ? Could the name be modified? I think the 'hive.metastore' means too much

fe/pom.xml Outdated
<dependency>
<groupId>com.aliyun.datalake</groupId>
<artifactId>metastore-client-hive3</artifactId>
<version>0.2.14</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz define the ${metastore-client-hive3.version} in $STARROCKS/pom.xml

@@ -723,6 +723,12 @@ public void createCatalog(Catalog catalog)
throw new TException("method not implemented");
}

@Override
public void alterCatalog(String s, Catalog catalog) throws NoSuchObjectException, InvalidObjectException,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this interface

import java.util.Optional;
import java.util.concurrent.ConcurrentHashMap;

public class ProxyMetaStoreClient implements IMetaStoreClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used by DLF?


public class ProxyMetaStoreClient implements IMetaStoreClient {
private static final Logger logger =
LoggerFactory.getLogger(com.aliyun.datalake.metastore.hive2.ProxyMetaStoreClient.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pom use metastore-client-hive3 , but here we use hive2. is it ok?

@@ -0,0 +1,2478 @@
// This file is licensed under the Elastic License 2.0. Copyright 2021-present, StarRocks Limited.
// This file is based on code available under the Apache license here:
// https://github.com/aliyun/datalake-catalog-metastore-client/blob/master/metastore-client-hive/metastore-client-hive3/src/main/java/com/aliyun/datalake/metastore/hive2/ProxyMetaStoreClient.java
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imay is license ok ?

@mxdzs0612
Copy link
Contributor Author

run starrocks_fe_unittest

@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage check]

😞 fail : 3 / 6 (50.00%)

file detail

path covered line new line coverage
🔵 com/starrocks/external/hive/HiveMetaStoreThriftClient.java 0 1 00.00%
🔵 com/starrocks/external/hive/HiveMetaClient.java 3 5 60.00%

Copy link
Contributor

@stephen-shelby stephen-shelby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caneGuy caneGuy changed the title Support Alibaba DLF metastore for hive external table [Feature]Support Alibaba DLF metastore for hive external table May 30, 2022
@caneGuy caneGuy merged commit 03e6ed0 into StarRocks:main May 30, 2022
@mxdzs0612 mxdzs0612 deleted the dlf branch May 30, 2022 08:14
abc982627271 pushed a commit to abc982627271/starrocks that referenced this pull request Jun 22, 2022
…ocks#6403)

Data Lake Formation (DLF) is a key component of the cloud-native data lake framework which is widely used on Alibaba Cloud just like AWS Glue, details see https://www.alibabacloud.com/en/product/datalake-formation. This PR allows users to use DLF as metastore in Hive external table.

Usage:
Add a config file hive-site.xml to {FE Home DIR}/conf , with following configs:

<?xml version="1.0"?>
<configuration>
    <!--Set to use dlf client-->
    <property>
        <name>hive.metastore.type</name>
        <value>dlf</value>
    </property>
    <!--DLF endpoint, see https://www.alibabacloud.com/help/en/doc-detail/197608.html-->
    <property>
        <name>dlf.catalog.endpoint</name>
        <value>dlf-vpc.cn-beijing.aliyuncs.com</value>
    </property>
    <!--DLF region, see https://www.alibabacloud.com/help/en/doc-detail/197608.html-->
    <property>
        <name>dlf.catalog.region</name>
        <value>cn-beijing</value>
    </property>
    <!--Proxy mode of DLF-->
    <property>
        <name>dlf.catalog.proxyMode</name>
        <value>DLF_ONLY</value>
    </property>
    <!--Access Key mode of DLF-->
    <property>
        <name>dlf.catalog.akMode</name>
        <value>EMR_AUTO</value>
    </property>
    <!--User id of the alibaba cloud account-->
    <property>
        <name>dlf.catalog.uid</name>
        <value>xxxxxx</value>
    </property>
    <!--Access Key ID of DLF, can be omitted if the cluster is created with the same Alibaba Cloud account of DLF-->
    <property>
        <name>dlf.catalog.accessKeyId</name>
        <value>xxxxxx</value>
    </property>
    <!--Access Key secret of DLF, can be omitted if the cluster is created with the same Alibaba Cloud account of DLF-->
    <property>
        <name>dlf.catalog.accessKeySecret</name>
        <value>xxxxxx</value>
    </property>
</configuration>
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: 絵空事スピリット <wanglichen@starrocks.com>
jaogoy pushed a commit to jaogoy/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: 絵空事スピリット <wanglichen@starrocks.com>
(cherry picked from commit 40daf76)

Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants