-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TableLocationSupplier for more flexible table placement #2855
Conversation
@@ -154,6 +155,25 @@ public String toString() { | |||
|
|||
protected abstract String defaultWarehouseLocation(TableIdentifier tableIdentifier); | |||
|
|||
protected TableLocationSupplier tableLocationSupplier(TableIdentifier tableIdentifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to make this a supplier rather than passing the incoming options to a method like defaultWarehouseLocation
? Is that because the table UUID is created in the TableMetadata
constructor?
Are there cases where you think the location would be based on the schema or sort order? I can't think of valid cases and I wonder if this could actually be a bad idea. For example, maybe this could look for a column pii struct<email string, ...>
and choose to place the table in a "sensitive" bucket, but I don't think that would be a good pattern for people to follow because it is error-prone: using PII
for the column name could break it.
I'm also skeptical that location would be based on the partition spec, especially because the spec may change.
It seems like this is required because the table UUID is generated by |
My understanding is that you are trying to make the default table location a function of all the other table states, and make this function a potentially pluggable interface. The biggest use case I see is for tables renamed to have the same name as an old table, they will still have a different path and not clobber the data if UUID is a part of the table path. Please let me know if there is any additional important use case that I missed. So to achieve this feature, I think another way is to have a different method in parallel to table builder's In the PR that Ryan referenced, I have changed to implement the To make this backwards compatible with existing implementations, we can remove the |
Currently table location placement is somewhat restrictive because the base location is defined by the catalog, but doesn't have all relevant information to construct paths for certain scenarios. The only information available to the catalog is
TableIdentifier
.Adding a TableLocationSupplier interface allows for more information to be used in defining the location of the table and allow for more flexible layouts and take advantage of other fields that are only populated later in the table creation lifecycle (like uuid, partition spec, sort order, or table properties).
Due to the way the table/metadata creation is performed, this is a little difficult to integrate both cleanly and in a backwards compatible way.
Posting this as a possible approach to allow for more flexible table layouts.