Skip to content

External tables support for SparkHiveDataSet #163

Open
@DebanjanBanerjeeQB

Description

@DebanjanBanerjeeQB

Description

SparkHiveDataset does not allow external hive tables at the moment. External tables are often encountered when the org database is outside hive and the table needs to be hosted in hive. More info available on : https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/using-hiveql/content/hive_create_an_external_table.html

Context

This will broaden the scope for hive datasets. Write now ant externally managed hive dataset needs to be referenced via a custom dataset and this happens quite often

Possible Implementation

Implementation is super simple. User needs to specify the keyword "External" in the DDL and specify a path for the table schema. Both can be tactically managed/input via catalog. Basis this input , the dataset should internally be able to decide the next course of actions and load/save data accordingly

Possible Alternatives

Accessing Hive table via HQL (but this again requires a HiveQueryDataSet (custom) ) which can access the metastore and query (bit slow)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedContribution task, outside help would be appreciated!

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions