-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Support table format: Iceberg, Delta, and Hudi #5650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
string date_partition_column_format = 5; | ||
|
||
// Table Format (e.g. iceberg, delta, etc) | ||
string table_format = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO, create TableFormat proto, consolidate with FileFormat proto
+1 on the inclusion of all 3 formats. Still I think we might be able to better design data-source side such that data source definitions don't tie the sources to specific offline stores. For example right now I think we can have best of both worlds if we instead go for adding all these formats as separate independent data sources ( |
query: The query to be executed in Spark. | ||
path: The path to file data. | ||
file_format: The format of the file data. | ||
file_format: The underlying file format (parquet, avro, csv, json). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not consolidate now?
+1 |
@franciscojavierarceo @tokoko consolidation with FilleFormat and new data sources could break the backward compatibility, so I want to do it pace by pace. |
That makes sense |
@HaoXuAI Why would new data sources break backwards compatibility though? |
There will be some proto changes, no 100% sure if there will be API changes exposed to users but I think might be the case |
@franciscojavierarceo @ntkathole mind take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HaoXuAI i don't see use actually using or testing Spark Table, Iceberg, or Hudi format's outside of our definitions, can you add that?
Can you also add documentation that these formats are now supported?
Otherwise lgtm.
|
Gonna update to add the TableFormat proto in the next PR, after that I'll add the docs. And I think the test will need to be changed as well. |
What this PR does / why we need it:
examples:
Which issue(s) this PR fixes:
Misc