-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append #16552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #71220 has finished for PR 16552 at commit
|
Test build #71239 has finished for PR 16552 at commit
|
retest this please |
Test build #71443 has finished for PR 16552 at commit
|
Test build #71444 has finished for PR 16552 at commit
|
Test build #71599 has finished for PR 16552 at commit
|
The overall idea is to use |
Test build #71639 has started for PR 16552 at commit |
Test build #71654 has finished for PR 16552 at commit
|
Test build #71652 has finished for PR 16552 at commit
|
Test build #71653 has finished for PR 16552 at commit
|
Test build #71659 has finished for PR 16552 at commit
|
// Check if the specified data source match the data source of the existing table. | ||
val existingProvider = DataSource.lookupDataSource(existingTable.provider.get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have HiveFileFormat
, and we can make it implement DataSourceRegister
, then DataSource.lookupDataSource("hive")
can work.
@@ -69,7 +68,7 @@ case class CreateHiveTableAsSelectCommand( | |||
withFormat | |||
} | |||
|
|||
sparkSession.sessionState.catalog.createTable(withSchema, ignoreIfExists = false) | |||
sparkSession.sessionState.catalog.createTable(withSchema, ignoreIfExists = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we don't need to build withSchema
anymore, the schema will be set in AnalyzeCreateTable
retest this please |
Test build #71886 has finished for PR 16552 at commit
|
You need to fetch the upstream and merge it to your local branch. Some changes were made and merged to the upstream/master, although they did not introduce the conflicts. The changes caused the compilation errors in your PR. |
Test build #71888 has finished for PR 16552 at commit
|
|USING hive | ||
""".stripMargin) | ||
val tempView = spark.sessionState.catalog.getTempView(tableName) | ||
assert(tempView.isDefined, "create a temp view using hive should success") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, it's not expected. let's add a check in CreateTempViewUsing
, and throw exception for hive provider, e.g. if (DDLUtils.isHiveTable(t)) throw ...
"not supported yet. Please use the insertInto() API as an alternative.") | ||
} | ||
|
||
// Check if the specified data source match the data source of the existing table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this line?
@@ -65,6 +65,10 @@ case class CreateTempViewUsing( | |||
} | |||
|
|||
def run(sparkSession: SparkSession): Seq[Row] = { | |||
if (provider.toLowerCase == DDLUtils.HIVE_PROVIDER) { | |||
throw new AnalysisException("Currently Hive data source can not be created as a view") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive data source can only be used with tables, you cannot use it with CREATE TEMP VIEW USING
@@ -1461,6 +1461,25 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { | |||
}) | |||
} | |||
|
|||
test("run sql directly on files - hive") { | |||
withTable("t") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to create a table
withTempPath { path =>
spark.range(100).toDF.write.parquet(path.getAbsolutePath)
...
sql(s"select id from hive.`${path.getAbsolutePath}`")
}
Test build #71918 has started for PR 16552 at commit |
retest this please |
Test build #71920 has started for PR 16552 at commit |
Test build #71910 has finished for PR 16552 at commit
|
retest this please |
@@ -65,6 +65,11 @@ case class CreateTempViewUsing( | |||
} | |||
|
|||
def run(sparkSession: SparkSession): Seq[Row] = { | |||
if (provider.toLowerCase == DDLUtils.HIVE_PROVIDER) { | |||
throw new AnalysisException("Hive data source can not be used with tables," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can only be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and please add a space after ,
retest this please |
Test build #71923 has finished for PR 16552 at commit
|
Test build #71925 has finished for PR 16552 at commit
|
thanks, merging to master! |
## What changes were proposed in this pull request? After [SPARK-19107](https://issues.apache.org/jira/browse/SPARK-19107), we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support. This PR implement: DataFrameWriter.saveAsTable work with hive format with append mode ## How was this patch tested? unit test added Author: windpiger <songjun@outlook.com> Closes apache#16552 from windpiger/saveAsTableWithHiveAppend.
## What changes were proposed in this pull request? After apache#16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in apache#16552 ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16693 from cloud-fan/minor.
## What changes were proposed in this pull request? After [SPARK-19107](https://issues.apache.org/jira/browse/SPARK-19107), we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support. This PR implement: DataFrameWriter.saveAsTable work with hive format with append mode ## How was this patch tested? unit test added Author: windpiger <songjun@outlook.com> Closes apache#16552 from windpiger/saveAsTableWithHiveAppend.
## What changes were proposed in this pull request? After apache#16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in apache#16552 ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16693 from cloud-fan/minor.
What changes were proposed in this pull request?
After SPARK-19107, we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support.
This PR implement:
DataFrameWriter.saveAsTable work with hive format with append mode
How was this patch tested?
unit test added