[Subtask] support fileset DDL operations for spark-connector #2461

caican00 · 2024-03-08T02:49:57Z

Describe the subtask

support fileset DDL，such as create, drop, etc

Parent issue

#1227

caican00 · 2024-03-08T02:50:44Z

Hi @FANNG1 what do you think of this?

FANNG1 · 2024-03-08T03:52:05Z

From the perspective of user, Spark sql normanly operate on tables, How should Spark operate on fileset? cc @jerryshao

caican00 · 2024-03-08T06:11:28Z

From the perspective of user, Spark sql normanly operate on tables, How should Spark operate on fileset? cc @jerryshao

Refer to databricks' volumn, which provides ddl operations for volumn cc @FANNG1 @jerryshao
https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-volume.html

jerryshao · 2024-03-08T06:23:13Z

I think spark/sparksql can support operating fileset data via SQL/RDD/Dataframe by using #1700 , we don't have to do anything more.

The linked above is to support manipulating volume(fileset) itself using SQL, it requires SQL extension. Currently, we don't have plan to do it.

caican00 · 2024-03-08T09:43:33Z

I think spark/sparksql can support operating fileset data via SQL/RDD/Dataframe by using #1700 , we don't have to do anything more.

The linked above is to support manipulating volume(fileset) itself using SQL, it requires SQL extension. Currently, we don't have plan to do it.

cc @coolderli

coolderli · 2024-03-08T10:19:14Z

I think spark/sparksql can support operating fileset data via SQL/RDD/Dataframe by using #1700 , we don't have to do anything more.

The linked above is to support manipulating volume(fileset) itself using SQL, it requires SQL extension. Currently, we don't have a plan to do it.

@jerryshao Do we have a plan to support some Fileset operations such as List Files、Drop Files and so on? If we want to achieve TTL, we may need an interface to operate the Fileset. We may have some ambiguity about the positioning of the Fileset. The Fileset is managed by Gravitino, and we have already supported creating a table by Gravitino, why not support creating a Fileset? Some users may prefer to use SQL other than the UI.

Actually, I think it is truly not consistent with the position of Gravitino. But we can supply tools or actions to help users manage the Fileset. It may not be our current highest priority, but we can implement it later.

jerryshao · 2024-03-08T10:35:41Z

I don't say we don't do it, what I said is that we don't have a plan to do it currently.

For ML users/DS, they can use our python client to manage filesets, it is much more straightforward than using SQL (which needs a separate query engine like Spark besides ML engine).

For DE, they can use Java client in their program (like Spark program) to achieve this.

Providing SQL interface is just an alternative compared to Java/Python, I don't see it is a must-have thing for now. SO IMO, I don't see a super high priority to achieve this in SQL. If you have a concrete scenario that requires SQL support, we can have a off-line discussion about this.

coolderli · 2024-03-11T01:56:48Z

I don't say we don't do it, what I said is that we don't have a plan to do it currently.

For ML users/DS, they can use our python client to manage filesets, it is much more straightforward than using SQL (which needs a separate query engine like Spark besides ML engine).

For DE, they can use Java client in their program (like Spark program) to achieve this.

Providing SQL interface is just an alternative compared to Java/Python, I don't see it is a must-have thing for now. SO IMO, I don't see a super high priority to achieve this in SQL. If you have a concrete scenario that requires SQL support, we can have a off-line discussion about this.

Much appreciate your response. No intention of offending. I completely agree with your point of view that this is not the highest priority right now.

caican00 added the subtask Subtasks of umbrella issue label Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Subtask] support fileset DDL operations for spark-connector #2461

[Subtask] support fileset DDL operations for spark-connector #2461

caican00 commented Mar 8, 2024

caican00 commented Mar 8, 2024

FANNG1 commented Mar 8, 2024

caican00 commented Mar 8, 2024

jerryshao commented Mar 8, 2024

caican00 commented Mar 8, 2024

coolderli commented Mar 8, 2024

jerryshao commented Mar 8, 2024

coolderli commented Mar 11, 2024

[Subtask] support fileset DDL operations for spark-connector #2461

[Subtask] support fileset DDL operations for spark-connector #2461

Comments

caican00 commented Mar 8, 2024

Describe the subtask

Parent issue

caican00 commented Mar 8, 2024

FANNG1 commented Mar 8, 2024

caican00 commented Mar 8, 2024

jerryshao commented Mar 8, 2024

caican00 commented Mar 8, 2024

coolderli commented Mar 8, 2024

jerryshao commented Mar 8, 2024

coolderli commented Mar 11, 2024