-
Notifications
You must be signed in to change notification settings - Fork 285
Refactors to support generic tables #1147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactors to support generic tables #1147
Conversation
@@ -24,7 +24,7 @@ | |||
import org.apache.iceberg.catalog.TableIdentifier; | |||
import org.apache.iceberg.rest.RESTUtil; | |||
|
|||
public class TableLikeEntity extends PolarisEntity { | |||
public class IcebergTableLikeEntity extends PolarisEntity { | |||
// For applicable types, this key on the "internalProperties" map will return the location | |||
// of the internalProperties JSON file. | |||
public static final String METADATA_LOCATION_KEY = "metadata-location"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason that we don't just make GENERIC a subType of TABLE_LIKE -- TABLE_LIKE assumes a metadata location (among other things), and we probably don't want to bake that assumption into generic tables. Beyond that, this leads us to refactors in resolution that should set the stage for volumes which would have a similar problem as generic tables.
@@ -16,7 +16,7 @@ | |||
* specific language governing permissions and limitations | |||
* under the License. | |||
*/ | |||
package org.apache.polaris.service.catalog; | |||
package org.apache.polaris.service.catalog.iceberg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this prefix parser doesn’t have to be exclusive to Iceberg; we could also utilize it for generic tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Currently, the proposal is to defer to PolarisIcebergCatalog for logical entities like namespace, and to the management APIs for entities like principal and catalog. createNamespace, for example, stays in the iceberg package even though it's used by generic tables. So moving this here feels consistent with that change.
However, in the future if we duplicate these APIs across and into PolarisGenericCatalog then I think it would make sense to move the prefix parser as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods in this class seems not really needed for generic tables, I think it is fine if we want to still keep it as Iceberg specific for now. However, I think let's get an independent pr for all renaming and relocation of iceberg classes separately for reivew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can hold off on moving anything else for now, but I wanted to introduce the new PolarisGenericCatalog
type here alongside the move of the Iceberg types to make it more obvious why we are moving the Iceberg types. The actual implementation of PolarisGenericCatalog
will follow.
I also want to introduce the persistence changes here, so it'll be clear which types deal with which entities and why we have a separate entity for generic tables. With the PR now, you can see that PolarisGenericCatalog
is able to persist and resolve generic table entities while PolarisIcebergCatalog
doesn't deal with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That PR start getting bigger, I think we should really think about how to break down this pr to smaller ones to make the review easier. We have docs and POCs that we can refer to help explain the reason of the code, i think things could be easy to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look of the PR, i think we might be fine since most of the change seems caused by rename/relocation, no real functionality change, let's see how other people think. We can further break this down if people want a smaller PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the first comment. The intent behind this prefix parser was to apply it to all catalog-like APIs.
Given that "generic" tables are going to be accessed from Spark as well as Iceberg Tables, I believe the prefix handling should be the same for both APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good y'all -- I moved this file and stripped the "Iceberg" from the name. It's still a little confusing since the prefix is indeed part of Iceberg REST API, but it's not for "Iceberg Catalog"s with the current structure of the PR. Let me know if you think any more changes are needed here.
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class PolarisGenericCatalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not pursue a common ABC (e.g. PolarisBaseCatalog
) between this and PolarisIcebergCatalog
because PolarisIcebergCatalog
already extends a parent class which should not be a parent of PolarisBaseCatalog
and java doesn't allow multiple inheritance.
On the other hand, I didn't create a shared interface PolarisBaseCatalog
because the only thing that's presently shared is the constructor (and the private member variables set in the constructor) and java doesn't support either private members in interfaces or default constructors in interfaces. So, it wouldn't actually reduce any repeated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that is the decision after the discussion also, we will delay the PolarisBaseCatalog refactoring till we know more about how federation interacts with the Catalog.
If that class is for GenericTable, let's just call this PolarisGenericTableCatalog to be more specific for now, if later we have other catalog API want to reuse the class, we can think do the refactor later.
@@ -41,22 +41,22 @@ public enum PolarisPrivilege { | |||
TABLE_CREATE(6, PolarisEntityType.NAMESPACE), | |||
VIEW_CREATE(7, PolarisEntityType.NAMESPACE), | |||
NAMESPACE_DROP(8, PolarisEntityType.NAMESPACE), | |||
TABLE_DROP(9, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | |||
VIEW_DROP(10, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | |||
TABLE_DROP(9, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we mentioned in the doc is that we will reuse the current table privilege, so that people don't need to grant both TABLE_DROP and GENERIC_TABLE_DROP, which could be quite confusing to users.
With this setup, i don't think it is easy to extend the definition, it seems more nature to me that we can define the privilege with set of subtype, but one main type. Maybe what we can do is to have the base TableLikeEntity with just identifier, and then we can extend it to have IcebergTableLikeEntity and GenericTableEntity, in the future if we want to have a DeltaTable or general VIEW entity, we can all extend on top of the TableLikeEntity.
Then here we can extend the PolarisPrivilege to take a list of SubType to reuse the Table Privileges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The authn happens at a higher level, such as here -- so I think with a different entity, we should still be able to enforce the existing permissions model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure how we plan to do this at authentication level, if we are going to do some special check to allow the operation with TABLE_DROP, I felt it is kind of hack. The privilege here seems defines that it is for the Iceberg table only, if we want to extend the definition, and have a different entity type there, we might have to take a list of <entity type, entity sub type>. compare with let the definition take an entity type, and list of subtype, the latter seems making more sense to me.
i don't think I am suggesting to completely reuse the TABLE LIKE entity here, i am just suggesting to refactor the TABLE_LIKE entity to a base entity class, and extend it to ICEBERG_TABLE_LIKE entity and GENERIC_TABLE entity, so we can still address the concern about having methods or keywords that only make sense for ICEBERG also appear as part of GENERIC_TABLE entity. Also, since all those have to be unique under the same namespace, i think it make sense to have them to share the same entity type, but use different subtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can make things work either way, but might want to hear little bit more about how the entity type and sub entity type is designed to be used in Polaris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, it looks quite ugly to clearly define that these privileges are for ICEBERG_TABLE_LIKE
+X but also for other types. What's the use case of these properties in this enum then?
The term "generic table" implies table semantics, but is it really a table or rather a view or a UDF?
The current privileges are, according to this type, tied to Iceberg tables + views (whether that makes sense or not).
IMHO we either go down the full route and have privileges by Iceberg-table, Iceberg-view + generic-"thing" or remove the type specific and just have privileges that are not tied to a specific entity type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy this sounds like more of a comment on the design rather than this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this PR changes all this - so it's actually both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy I don't think this PR currently does setup for GENERIC_TABLE privilege yet, and the change in this file is just a renaming to me.
And you are right, the DROP_TABLE and DROP_VIEW privilege is only setup for Iceberg table and View now. The generic tables are tables, not view, and during the design, we mentioned that we will reuse the *_TABLE privilege since they are all tables, and having DROP_TABLE and DROP_GENERIC_TABLE privilege seems awkward to users. If we need more fine grained control in the future, it make more sense to introduce privileges like DORP_ICEBERG_TABLE, DROP_GENERIC_TABLE.
There is definitely changes needed to set up the privilege correctly, but i don't think that is the purpose of the current PR. The current PR are mainly doing refactor/rename, and with some basic class setup to demo the purpose of rename/refactor
polaris-core/src/main/java/org/apache/polaris/core/entity/PolarisEntityType.java
Show resolved
Hide resolved
@@ -373,7 +373,8 @@ public ResolverStatus resolveAll() { | |||
|
|||
// validate input | |||
diagnostics.check( | |||
entityType != PolarisEntityType.NAMESPACE && entityType != PolarisEntityType.TABLE_LIKE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why this function does not handle TABLE_LIKE, the comment is not clear about why, I assume the GENEIC_TABLE should work similar as TABLE_LIKE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a hard-coded assumption that we are always resolving a TABLE_LIKE (view or table) at the end of a chain of NAMESPACEs
@@ -16,7 +16,7 @@ | |||
* specific language governing permissions and limitations | |||
* under the License. | |||
*/ | |||
package org.apache.polaris.service.catalog; | |||
package org.apache.polaris.service.catalog.iceberg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods in this class seems not really needed for generic tables, I think it is fine if we want to still keep it as Iceberg specific for now. However, I think let's get an independent pr for all renaming and relocation of iceberg classes separately for reivew.
TASK(8, ROOT, false, false), | ||
FILE(9, TABLE_LIKE, false, false); | ||
FILE(9, ICEBERG_TABLE_LIKE, false, false), | ||
GENERIC_TABLE(10, NAMESPACE, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dennishuo / @dimas-b do either of you have a POV on adding a new entity type here rather than a subtype of TABLE_LIKE?
One one hand, generic tables conceptually are "table-like". But TableLikeEntity currently includes a "location" and many places in the code that handle it seem to assume it's an Iceberg entity. I want to try to avoid any confusion or accidental mixing of the types, so I decided to create a standalone entity and to rename the TableLikeEntity for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my POV it's ok to treat them as siblings (also hinted at that in another comment).
In larger context, having only type + sub-type seems rather awkward for a type system.... but I imagine this is not refactorable until Polaris 2.0 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In larger context, having only type + sub-type seems rather awkward for a type system.
Agreed with @dimas-b for this one. The only option now is to add another layer of subtype in entity properties.
I like the new entity type that it can easily use the subtype for Delta
, Parquet
, etc. From the privilege perspective, do we need another layer on top of both ICEBERG_TABLE_LIKE
and GENERIC_TABLE
to group them somehow? I'm also open to other options in terms of grouping privileges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we all think have an entity type fits with the Polaris design better, I am also good on this.
@sfc-gh-ygu in terms of your question about privilege, today's privilege is defined with entity type + subtype, not sure if we can add another layer on top of the entity type, but we can definitely extend how privilege is defined to take not just one entity type + subtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, this pr looks good to me, i might just want some more clarification about the design purpose of entity type and subEntity type in Polaris. cc @dennishuo Could you help filling some background on this for me?
*/ | ||
public class GenericTableEntity extends PolarisEntity { | ||
|
||
public static final String FORMAT_KEY = "format"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need another key for table properties? i believe the InternalProperties and properties fields in Polaris Entity is not the same as table properties? We can also add this in a separate pr, maybe just left a TODO here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additionalProperties
that are included with requests should indeed get piped into the same properties field that's in PolarisEntityCore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
|
||
@JsonIgnore | ||
public String getFormat() { | ||
return getInternalPropertiesAsMap().get(GenericTableEntity.FORMAT_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is the usage distinguish between InternalProperties and Properties, i assume the InternalProperties is used to store fields that we will not return in any response to user like metadata location, but properties are used to store fields that will be directly surface back to user, like base location. if that is the case, i assume format should be stored in property map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the other way around AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.polaris.service.catalog.generic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also put this module specific to generictables to organize the code in a more fine grained structure?
@@ -16,7 +16,7 @@ | |||
* specific language governing permissions and limitations | |||
* under the License. | |||
*/ | |||
package org.apache.polaris.service.catalog; | |||
package org.apache.polaris.service.catalog.iceberg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look of the PR, i think we might be fine since most of the change seems caused by rename/relocation, no real functionality change, let's see how other people think. We can further break this down if people want a smaller PR.
@@ -41,22 +41,22 @@ public enum PolarisPrivilege { | |||
TABLE_CREATE(6, PolarisEntityType.NAMESPACE), | |||
VIEW_CREATE(7, PolarisEntityType.NAMESPACE), | |||
NAMESPACE_DROP(8, PolarisEntityType.NAMESPACE), | |||
TABLE_DROP(9, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | |||
VIEW_DROP(10, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | |||
TABLE_DROP(9, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can make things work either way, but might want to hear little bit more about how the entity type and sub entity type is designed to be used in Polaris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR LGTM except for the part where Iceberg and non-Iceberg catalog bifurcate (specific comment below).
* that they may not have a schema or base location. Similarly to {@link IcebergTableLikeEntity} | ||
* however, these tables have an identifier and a parent namespace. | ||
*/ | ||
public class GenericTableEntity extends PolarisEntity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we settle on the term "Generic Table" in the design doc (I genuinely do not recall)?
Conceptually, I'd expect Iceberg tables to be a sub-type of "generic table", but it is not so...
How about UnstructuredTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did have a consensus there (and another PR has merged that bears this name) but personally I am not opposed to a change if we find a better alternative. "Generic" was my suggestion however, so I will say a few things in defense of it:
- I actually think Iceberg tables can be a subtype of generic table. Currently generic tables are always untyped but in the future I believe we may want to do some kind of validation (e.g. "delta" format tables always have a location, "kafka" tables should always have a port, etc.).
- As for UnstructuredTable, I think that this -- like another name we considered, PropertyTable -- is a little too strong handed in implying that these tables are always unstructured (even the data? or just the metadata?) whereas that may not always be the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we did reach to a consensus about "generic", because we do want to potentially allow this set of APIs to manage various types of table, includes iceberg.
When I look at the ICEBERG_TABLE_LIKE_ENTITY, and GENERIC_TABLE_ENTITY, it is literally two different ways to represent a table, and based on the representation, different operations can be performed. Like Eric has mentioned, the Generic representation only takes a format, and the format could potentially be iceberg in the future. Or even step back, we can also allow Iceberg as a format, the only thing is we can not provide full Iceberg support (commit coordination) with this representation.
For name like unstructured, i have the same feeling like Eric, which is kind of too general, almost indicates just a set of files to me.
Therefore i do think generic fits better for the case we intended to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changing the topic of this thread slightly)
The argument about "representations" makes sense to me, but only at the REST API layer. These entity classes, however, relate to data stored in Polaris persistence. Unless we revamp the Persistence API, I do not see how it is possible to store a ICEBERG_TABLE_LIKE_ENTITY but load a GENERIC_TABLE_ENTITY from persistence 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you thinking about the case that if in the future we make the GENERIC_TABLE_ENTITY rich enough to be able to represent full Iceberg information, we can deprecate the use of ICEBERG_TABLE_LIKE_ENTITY at application layer, and then we will need to load the old ICEBERG_TABLE_LIKE_ENTITY as GENERIC_TABLE_ENTITY?
For an entity stored in one format, if we want to load it as another format, I think you are right, the conversion has to happen at some where, it could also potentially happen after the load like at the application layer.
However, I am not sure if that is the right thing to do. To me, one table should just have one representation, either ICEBERG_TABLE_LIKE or GENERIC_TABLE. Even if in the future we make generic table rich enough to deal with full iceberg support, i think we should make sure we can deal with both representations correctly until we can safely deprecate the legacy one. That is just my current opinion, I believe there is definitely other ways to handle that, and we can discuss this when it comes to that point.
Sorry if my previous comment raised any confusion, all I was trying to say is our intention for GENERIC_TABLE is potentially make it a general representation for various format of tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see how it is possible to store a ICEBERG_TABLE_LIKE_ENTITY but load a GENERIC_TABLE_ENTITY from persistence
I am thinking about a future where you make a call to load an iceberg table and the server loads a generic table (via conversion) and constructs an Iceberg loadTableResponse.
Or, a case where you make a call to load a generic table which happens to be an Iceberg table; it may be persisted as an Iceberg table.
Basically, I want to avoid coupling the persistence with the user-facing aspects of these tables. Still, I think we need two persistence entities for now.
TASK(8, ROOT, false, false), | ||
FILE(9, TABLE_LIKE, false, false); | ||
FILE(9, ICEBERG_TABLE_LIKE, false, false), | ||
GENERIC_TABLE(10, NAMESPACE, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my POV it's ok to treat them as siblings (also hinted at that in another comment).
In larger context, having only type + sub-type seems rather awkward for a type system.... but I imagine this is not refactorable until Polaris 2.0 :)
@@ -16,7 +16,7 @@ | |||
* specific language governing permissions and limitations | |||
* under the License. | |||
*/ | |||
package org.apache.polaris.service.catalog; | |||
package org.apache.polaris.service.catalog.iceberg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the first comment. The intent behind this prefix parser was to apply it to all catalog-like APIs.
Given that "generic" tables are going to be accessed from Spark as well as Iceberg Tables, I believe the prefix handling should be the same for both APIs.
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class PolarisGenericTableCatalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that having this class follows the existing code patterns. However, from a more general POV, by treating "generic" tables and Iceberg tables separately we're losing the ability to handle multi-entity changes in Polaris atomically.
In effect Polaris moves toward being a union of catalogs rather than a single catalog. I believe the latter approach (single catalog) is more architecturally sound (even though it is harder to implement initially).
Existing Polaris Persistence APIs allow for a single catalog model, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working based on this diagram from Yun:

In this design, the shared resolution manifest and shared metastore manager can in theory allow for multi-entity transactions. In my interpretation, the classes PolarisIcebergCatalog
and PolarisGenericTable
catalogs take ownership of various [operations on] entities, but that doesn't mean entities can't span the classes. In fact, I'm interested in maybe moving some methods like createNamespace
to a shared parent type since they're really applicable in both places.
In any event, this PR is not really meant to implement PolarisGenericTableCatalog
but just to introduce the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The xxxCatalog classes are really implementation of the operations, not really an indication of whether Polaris is is multi-catalog of single-catalog. I don't think it is a good idea to put all implementations in one file, this is really just try to make the code structure more clear about the responsibilities. If the name is misleading, we can potentially change it to a different name.
Also as Eric has mentioned, we can potentially extract a common parent/base class for common functionality, @dennishuo has brought some concern that is could potentially cause some conflict with how federation could be implemented. Therefore, we decided to just keep the implementation separately at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PolarisGenericTableCatalog was removed from this PR, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The xxxCatalog classes are really implementation of the operations, not really an indication of whether Polaris is is multi-catalog of single-catalog
Good point. I suppose it depends on how we implement the new "generic table" API. Are you suggesting that it would be possible to manipulate Iceberg tables via the "generic" API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it totally depends on the definition of the operation. For example, if the listGenericTable APIs is defined to list all tables (includes iceberg), then it will also need to access the iceberg tables to get all tables.
However, that is just an example, it is not what is designed to do for the current MVP. Extend the responsibilities of the APIs will definitely need some careful design, especially about how it interacts with the current iceberg endpoints, otherwise it could be really confusing to users. Recently I have encountered a problem when integrating with glue catalog, where we expect the iceberg endpoint only return iceberg tables, but it actually returns tables with other formats, which is causing a lot of problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously sent out a doc on the mailing list about the high level code architecture proposal for fitting the generic table support. I would appreciate it if you could help take a look and see if it make sense.
There are some nice refactorings/"nomenclature fixes" in this change. |
...rc/test/java/org/apache/polaris/service/quarkus/catalog/PolarisCatalogNoEntityCacheTest.java
Outdated
Show resolved
Hide resolved
...vice/src/test/java/org/apache/polaris/service/quarkus/catalog/PolarisIcebergCatalogTest.java
Outdated
Show resolved
Hide resolved
* that they may not have a schema or base location. Similarly to {@link IcebergTableLikeEntity} | ||
* however, these tables have an identifier and a parent namespace. | ||
*/ | ||
public class GenericTableEntity extends PolarisEntity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changing the topic of this thread slightly)
The argument about "representations" makes sense to me, but only at the REST API layer. These entity classes, however, relate to data stored in Polaris persistence. Unless we revamp the Persistence API, I do not see how it is possible to store a ICEBERG_TABLE_LIKE_ENTITY but load a GENERIC_TABLE_ENTITY from persistence 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this PR unveils the limitations of the "type-code" + "sub-type-code" approach and the handling of privileges raises related concerns (mixing Iceberg-table privileges with generic-table ones). I fear that it becomes quite difficult to understand and reason about.
However, the PR title and description say "... to support ...", but this change also introduces a new entity type. My ask is to remove the actual introduction of the new entity and just focus on the preparation parts.
...ervice/src/test/java/org/apache/polaris/service/quarkus/catalog/GenericTableCatalogTest.java
Outdated
Show resolved
Hide resolved
|
||
@QuarkusTest | ||
@TestProfile(GenericTableCatalogTest.Profile.class) | ||
public class GenericTableCatalogTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have test cases when accessing Iceberg tables + views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in GenericTableCatalog
and GenericTableCatalogTest
should be understood more as an example right now. This code is never hit by a Polaris service running in practice. Please refer to the PR description:
The focus here is on refactoring and implementing GenericTableCatalog is not the real goal of this PR ...
@@ -140,7 +141,8 @@ | |||
import software.amazon.awssdk.services.sts.model.AssumeRoleResponse; | |||
import software.amazon.awssdk.services.sts.model.Credentials; | |||
|
|||
public abstract class BasePolarisCatalogTest extends CatalogTests<BasePolarisCatalog> { | |||
@TestProfile(IcebergCatalogTest.Profile.class) | |||
public abstract class IcebergCatalogTest extends CatalogTests<IcebergCatalog> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have test cases to verify the behavior when accessing generic tables in list-tables/views, load-table/view, create-table/view, drop table/view/namespace, update table/view/namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
@@ -41,22 +41,22 @@ public enum PolarisPrivilege { | |||
TABLE_CREATE(6, PolarisEntityType.NAMESPACE), | |||
VIEW_CREATE(7, PolarisEntityType.NAMESPACE), | |||
NAMESPACE_DROP(8, PolarisEntityType.NAMESPACE), | |||
TABLE_DROP(9, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | |||
VIEW_DROP(10, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | |||
TABLE_DROP(9, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, it looks quite ugly to clearly define that these privileges are for ICEBERG_TABLE_LIKE
+X but also for other types. What's the use case of these properties in this enum then?
The term "generic table" implies table semantics, but is it really a table or rather a view or a UDF?
The current privileges are, according to this type, tied to Iceberg tables + views (whether that makes sense or not).
IMHO we either go down the full route and have privileges by Iceberg-table, Iceberg-view + generic-"thing" or remove the type specific and just have privileges that are not tied to a specific entity type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @eric-maynard. I'm OK to rename TABLE_LIKE
to ICEBERG_TABLE_LIKE
as it is built for that. We may need a parent concept for all table-like entities, but we can create that later if needed.
...mon/src/main/java/org/apache/polaris/service/catalog/iceberg/DefaultCatalogPrefixParser.java
Outdated
Show resolved
Hide resolved
* rename polarisbasecatalog * move catalog files * more renames; introduce class * add basic create/load methods * some refactoring * test stable * small rename * rename per review * bump id * move a file * autolint * rebase * autolint * changes per review * autolint * rename * autolint * fix merge conflict
* Generify MetaStoreManagerFactory.getOrCreateSessionSupplier (apache#1173) No functional change. * Adjust the type parameter to the Persistence supplier to cover all possible implementation types. * Remove unnecessary fields from IcebergCatalogAdapter * Adjust types at call sites of `getOrCreateSessionSupplier`. * [Enhancement] Refactor Cleanup Task Handler (apache#516) * refactor cleanup task handler * format * make base class abstract * make cleanup task record * simplify logger * update test after merge * update task handler register after merge * fix test error after merge * refine call context and ut after merge * fix ci error (apache#1174) * main: Update dependency org.junit:junit-bom to v5.12.1 (apache#1177) * Publish 0.9.0 documentation (apache#1175) * main: Update dependency gradle to v8.13 (apache#1063) * main: Update dependency gradle to v8.13 * Adopt build-scripts to Gradle changes See https://docs.gradle.org/8.13/userguide/upgrading_version_8.html#changes_to_jvmtestsuite --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * Remove jetbrains-annotations (apache#1176) The main intention of this change is to avoid confusion between Jetbrains' `@NotNull` and Jakarta's `@Nonnull` (the latter is standard in the Polaris codebase). As a side effect `@Contract` is no longer available. However, its value is realised only in tools that support it and Polaris builds do not rely on it for producing artifacts. For a human being the value of `@Contract` appears to be negligible compared to javadoc. Therefore, in the interest of keeping annotation dependencies concise, `@Contract` lines are removed. Jetbrains' `@VisibleForTesting` is converted to the same annotation from Guava (which is also standard in the Polaris codebase). * main: Update dependency com.google.cloud:google-cloud-storage-bom to v2.50.0 (apache#1178) * Isolate Persistence objects in different threads. (apache#1166) * Simplify PolarisGrantManager (apache#1171) * Simplify PolarisGrantManager Previously strongly typed methods redirected to typeless lookup methods, while implementations had only the typeless variant. This change reverts the redirects from strongly typed methods to existing typeless methods in implementations. As a result it is possible to simplify the interface by removing typeless lookup methods. Existing call sites all have strongly typed parameters available and use the typed lookup methods now. For context: This refactoring seems valuable by itself, but it is also needed for the upcoming NoSQL implementations for reasons similar to apache#1112 * main: Update dependency com.github.spotbugs:spotbugs-annotations to v4.9.3 (apache#1180) * main: Update mockito monorepo to v5.16.1 (apache#1182) * main: Update dependency software.amazon.awssdk:bom to v2.31.1 (apache#1188) * Remove extra run in dockerfile (apache#1185) * Policy Store: Add PolicyEntity and PolicyTypes (apache#1133) * Fix spark download and add check/cleanup (apache#1184) * Update EclipseLink doc to 4.0 (apache#1198) * Sync psql persistence (apache#1187) * sync persistence config when using Postgres * sync persistence config when using Postgres * sync persistence config when using Postgres * Revert change for 0.9.0 * Renovate: Group all Quarkus dependencies (apache#1206) Quarkus-platform releases happen some time after the "actual" Quarkus release, which causes "broken" CI for Renovate PRs against for example the Quarkus Gradle plugin. This change groups all Quarkus dependencies together to consistently bump all Quarkus-platform dependencies at once. * The ASF Infra deployed a new parser, that requires a fix on our .asf.yaml (apache#1209) * Let 'spotless' run on all java source directories (apache#1205) Only runs on 'main', 'test', 'testFixtures', but not on others like 'intTest'. This change fixes this. * commit (apache#1201) * Move Polaris client into root dir (apache#1172) * Move python client into root dir * Fix paths for regtests * Fix path for notebooks and docs * Change the path within container for consistency * Add License Header * Fix the copy folder cmd to restore the original regtests layout * Rearrange dir layout inside docker * Rename classes in transactional persistence package (apache#1197) * Refactors to support generic tables (apache#1147) * rename polarisbasecatalog * move catalog files * more renames; introduce class * add basic create/load methods * some refactoring * test stable * small rename * rename per review * bump id * move a file * autolint * rebase * autolint * changes per review * autolint * rename * autolint * fix merge conflict * main: Update dependency ch.qos.logback:logback-classic to v1.5.18 (apache#1196) * main: Update Quarkus Platform and Group (apache#1212) * main: Update dependency boto3 to v1.37.16 (apache#1093) * Use 'en-us' in all `Dockerfile`s and Gradle `Test` tasks (apache#1214) Fixes apache#885 Supersedes apache#886 * main: Update dependency com.google.guava:guava to v33.4.5-jre (apache#1218) * IcebergCatalogAdapter: close underlying catalog consistently (apache#1224) With the revert of #b84f4624db8d0bd5b8920b0df719bcc15666008f by #ccf25df7b055e9d232b88a3f6fe8b4e0a2ab035a, we lost an extra benefit that was included in that change: a fix for the fact that IcebergCatalogHandlerWrapper does not always close its underlying `Catalog`, thus relying on `CallContext` to play the role of the "sweep vehicle" and close everything that was left unclosed at the end of the request processing. This PR re-applies that fix again. * Add BOM (Bill of Materials) (apache#1216) Fixes apache#788 * Fix CI build after polaris.io (apache#1232) * Add Apache Polaris Community Meeting from March 20, 2025 (apache#1234) * Simplify `polaris` client script (apache#1220) Address [SC2164](https://github.com/koalaman/shellcheck/wiki/SC2164), [SC2098](https://github.com/koalaman/shellcheck/wiki/SC2098), [SC2086](https://github.com/koalaman/shellcheck/wiki/SC2086) Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> * Move polaris-admin-tool tests to separate package (apache#1227) ... and make the PG test-resource non-global. * Let `PurgeCommand` inspect the results (apache#1226) * Add type and converters for `MemorySize` (apache#1230) The type allows human friendly memory size specifications like `32k` or `64M`, including support for smallrye-config and Jackson. * Retire `polaris-reg-test` script (apache#1219) * Nit: add misc-types to bom (apache#1241) * JWTBroker: fix refresh token logic (apache#1242) * Fix overzealous check in the Polaris CLI (apache#1237) * fix * adjust * revert * main: Update dependency software.amazon.awssdk:bom to v2.31.6 (apache#1245) * main: Update dependency boto3 to v1.37.18 (apache#1244) * Add zip+tar to publishable artifacts and add a `run.sh` script (apache#1082) Adds the tar+zip distribution archives as publishable artifacts to Maven publication. Also updates polaris-quarkus-admin to build as a "fast-jar" instead of an "uber-jar". * Build: Add `pom.xml`, `pom.properties` and LICENSE+NOTICE to release jars (apache#1036) Adds convenient, but not strictly necessary information to each generated "main" jar. This includes `pom.properties` and `pom.xml` files where Maven places those, in `META-INF/maven/group-id/artifact-id/`. Also adds the `NOTICE` and `LICENSE` files in `META-INF`, which makes it easier for license scanners. * Doc: catalog bootstrap steps for helm deployment (apache#1243) * main: Update actions/setup-python digest to 8d9ed9a (apache#1249) * main: Update dependency com.google.guava:guava to v33.4.6-jre (apache#1251) * main: Update actions/stale digest to ba23c1c (apache#1250) * Core: Add data compaction policy content parser and validator (apache#1238) * main: Update gradle/actions digest to 06832c7 (apache#1255) * Admin tool: fix Dockerfile.jvm (apache#1256) * Implement GenericTableCatalog (apache#1231) * add missing apis * more tests, fixes * clean up drop * autolint * changes per review * revert iceberg messages to comply with oss tests * another revert * more iceberg catalog changes * autolint * wip * refactor to subtype * autolint * rebase * add another assert * autolint * add another best effort check * autolint * reduce metastore trips * autolint * API Spec: Add ConnectionConfigInfo to ExternalCatalog (apache#1026) * API Spec: Add ConnectionConfigInfo to ExternalCatalog Remove the currently unused remoteUrl field from the top-level ExternalCatalog into the ConnectionConfigInfo as "uri" instead for better consistency; remote catalogs in the future may be defined by arbitrary URIs that are not, for example, http(s) URLs. This is just the spec definition for now, so it's not yet wired into the internal entity layer or persistence objects. Allow extensibility of different connection types in the future even if we start with only an ICEBERG_REST type. Similarly, provide extensibility for different authn mechanisms to use with the connection. * Implement service interfaces for policies & generic tables (apache#1263) * ready * autolint * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1 (apache#1268) * main: Update dependency io.smallrye.common:smallrye-common-annotation to v2.11.0 (apache#1267) * main: Update dependency io.smallrye.config:smallrye-config-core to v3.12.4 (apache#1266) * Add openApiGenerate task as dependency for processResources (apache#1259) * Core: Add policy content and validator for more maintenance policies (apache#1261) * Better error message when sibling resolution fails (apache#1253) * better error * autolint * better message * Add CLI dependency update option (apache#1222) * PySpark Reg Test Updates (apache#1262) * PySpark Reg Test Updates * Nits --------- Co-authored-by: Travis Bowen <travis.bowen@snowflake.com> * Update .asf.yaml adding dismiss_stale_reviews to true and require_last_push_approval to false (apache#1265) * main: Update dependency software.amazon.awssdk:bom to v2.31.11 (apache#1279) * main: Update dependency boto3 to v1.37.23 (apache#1278) * main: Update dependency com.azure:azure-sdk-bom to v1.2.33 (apache#1275) * Update team (apache#1282) * Vend Azure credentials compatible with Iceberg 1.7 (apache#1252) * update * autolint * fix * autolint * clean up * autolint * test * autolint * paranoid check * typofix * Add a note about nit/minor comments (apache#1280) * Upgrade Iceberg to 1.8.1 (apache#1126) * Policy Store: PolicyMappingRecord with Persistence Impl (apache#1104) * Spark: Setup repository code structure and build (apache#1190) * Added freshness aware table loading using metadata file location for ETag (apache#1037) * Pulled in iceberg 1.8.0 spec changes for freshness aware table loading and added feature to Polaris * Changed etag support to use entityId:version tuple * fixed getresponse call * Changed etagged response to record and gave default implementation to ETaggableEntity * Made iceberg rest spec docs clearer * Added HTTP Compliant ETag and IfNoneMatch representations and separated persistence from etag logic * Changed ETag to be a record and improved semantics of IfNoneMatch * Fixed semantics of if none match * Removed ETag representation, consolidated in IfNoneMatch * fixed if none match parsing * Added table entity retrieval method to table operations * removed accidental commit of pycache folders * Fixed formatting * Changed to use metadata location hash * Ran formatting * use sha256 * Moved out ETag functions to utility class and removed ETaggedLoadTableResponse * Addressed comments * Fixed IcebergTableLikeEntity package rename * main: Update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.31.0 (apache#1288) * Update LICENSE and NOTICE in the distributions (admin and server) (apache#1258) * Gradle/Quarkus: make imageBuild task depend on jandex (apache#1290) * Core: Clarify the atomicity of BasePersistence methods (apache#1274) * Implement GenericTableCatalogAdapter (apache#1264) * rebase * more fixes * autolint * working on tests * stable test * autolint * polish * changes per review * some changes per review * grants * autolint * changes per review * changes per review * typofix * Improve code-containment and efficiency of etag-aware loading (apache#1296) * Improve code-containment and efficiency of etag-aware loading -Make the hash generation resilient against null metadataLocation -Use getResolvedPath instead of getPassthroughResolvedPath to avoid redundant persistence round-trip -Only try to calculate the etag for comparison against ifNoneMatch if the ifNoneMatch is actually provided * Add strict null-checking at callsites to generateETag, disallow passing null to generator * Add TODO to refactor shared logic for etag generation * Core: Add Endpoints and resource paths for Generic Table (apache#1286) * main: Update dependency com.nimbusds:nimbus-jose-jwt to v10.1 (apache#1299) * [JDBC] Part1 : ADD SQL script for Polaris setup (apache#1276) * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1743605859 (apache#1300) * done (apache#1297) * Add Polaris Community Meeting for April 3, 2025 (apache#1304) * Use config-file to define errorprone rule (apache#1233) Also enabled a couple more simple rules, and adding suppressions/fixes for/to the code. The two rules `EqualsGetClass` and `UnusedMethod`, which I think are useful, are not enabled yet, because that would mean actual code changes, which I do not want to do in this PR. The rule `PatternMatchingInstanceof`, introduced in apache#393, is disabled in this PR. It does not work before errorrpone 2.37.0 (via apache#1213) - requires additional changes to enable the rule (see apache#1215). * Add Yun as a contributor (apache#1310) * Refactor CatalogHandler to comply with ErrorProne rules (apache#1312) Fix the CI error after apache#1233 * Implement PolicyCatalog Stage 1: CRUD + ListPolicies (apache#1294) * main: Update dependency io.opentelemetry:opentelemetry-bom to v1.49.0 (apache#1316) * main: Update docker.io/jaegertracing/all-in-one Docker tag to v1.68.0 (apache#1317) * main: Update dependency boto3 to v1.37.28 (apache#1328) * main: Update dependency software.amazon.awssdk:bom to v2.31.16 (apache#1329) * Make `BasePolaritsMetaStoreManagerTest` and `(Base)ResolverTest` reusable (apache#1308) Moves the test cases into the `Base*` classes and make sure the classes can be reused by other persistence implementations. * main: Update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.32.0 (apache#1293) * main: Update mockito monorepo to v5.17.0 (apache#1311) * PySpark Update AWS Region (apache#1302) Co-authored-by: Travis Bowen <travis.bowen@snowflake.com> * main: Update dependency com.nimbusds:nimbus-jose-jwt to v10.2 (apache#1334) * main: Update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.0.3 (apache#1335) * Maven publication: Produce correct `<scm><tag>` in `pom.xml` (apache#1330) `project.scm.tag` in a Maven pom is intended to refer to the SCM (Git) tag. We currently publish `main`, which is incorrect. This change omits the SCM tag for snapshot builds, but emits the Git tag for releases. * Remove `@StaticInitSafe` annotation (apache#1331) There was an issue around mapped configurations having the `@StaticInitSafe` annotation that led to _two_ instances (a "static" one and a "somewhet application-scoped" one) - this was fixed in Quarkus 3.21. One bug in smallrye-config is fixed for Quarkus > 3.21.0, another issue however remains. Since `@StaticInitSafe` annotated configs seem to cause some weird issues, it seems legit to remote that annotation altogether. This approach was [taken in Nessie](projectnessie/nessie#10606) as well. Investigations (via practical experiments) have proven that there's no measurable impact (runtime + heap) when doing this - and that's also been confirmed by Quarkus + Smallrye-config maintainers. Hence this change remotes that annotation from the code base. * Build/Release: Add a "generate digest" task and use for source tarball and Quarkus distributables (apache#1271) * Ensure that digest and signature are generated for both Polaris-Server and admin tar/zip distribution * Move "generate digest" functionality to a Gradle task * main: Update dependency com.google.errorprone:error_prone_core to v2.37.0 (apache#1213) * main: Update Quarkus Platform and Group to v3.21.1 (apache#1291) * main: Update dependency io.netty:netty-codec-http2 to v4.2.0.Final (apache#1301) * Remove unnecessary `clean` and `--no-build-cache` from Gradle invocations (apache#1338) `quarkusAppPartsBuild --rerun` is the right way to force a Docker image build. * Generalize bootstrapping in servers (apache#1313) * Remove `instanceof` checks from `QuarkusProducers`. * Remove the now unused `onStartup` method from `InMemoryPolarisMetaStoreManagerFactory`. * Instead, call the good old `bootstrapRealms` method from `QuarkusProducers`. * Add new config property to control which MetaStore types are bootstrapped automatically (defaults to `in-memory` as before). * There is no bootstrap behaviour change in this PR, only refactorings to simplify code. * Add info log message to indicate when a realm is bootstrapped in runtime using preset credentials. Future enhancements may include pulling preset credentials from a secret manager like Vault for bootstrapping (s discussed in comments on apache#1228). * main: Update actions/stale digest to 816d9db (apache#1341) * main: Update dependency com.adobe.testing:s3mock-testcontainers to v4 (apache#1342) * main: Update dependency org.eclipse.persistence:eclipselink to v4.0.6 (apache#1343) * main: Update dependency io.quarkus to v3.21.2 (apache#1344) * main: Update dependency com.google.guava:guava to v33.4.7-jre (apache#1340) Co-authored-by: Robert Stupp <snazy@snazy.de> * Spark: Add Namespaces and View support for SparkCatalog (apache#1332) * Demote technical log messages to DEBUG in PolarisCallContextCatalogFactory (apache#1346) These messages appear to be logging low-level technical details about what is going on in the factory and are not likely to be of interest to most users on a daily basis. * Core/Service: Implement PolicyCatalog Stage 2: detach/attach/getApplicablePolicies (apache#1314) * Spec: Add 'inherited' and 'namespace' Fields to GetApplicablePolicies API Response (apache#1277) * Properly track bootstrappedRealms in InMemoryPolarisMetaStoreManagerFactory (apache#1352) Fixes apache#1351 * Implement GenericTableCatalogAdapter; admin-related fixes (apache#1298) * initial commit: * debugging * some polish * autolint * spec change * bugfix * bugfix * various fixes * another missing admin location * autolint * false by default * fixes per review * autolint * more fixes * DRY * revert small change for a better error * integration test * extra test * autolint * stable * wip * rework subtypes a bit * stable again * autolint * apply new lint rule * errorprone again * adjustments per review * update golden files * add another test * clean up logic in PolarisAdminService * autolint * more fixes per review * format * Update versions in distribution LICENSE and NOTICE (apache#1350) * Spark: Add CreateTable and LoadTable implementation for SparkCatalog (apache#1303) * Add a weigher to the EntityCache based on approximate entity size (apache#490) * initial commit * autolint * resolve conflicts * autolint * pull main * Add multiplier * account for name, too * adjust multiplier * add config * autolint * remove old cast * more tests, fixes per review * add precise weight test * autolint * populate credentials field for loadTableResponse (apache#1225) * populate credentials field for loadTableResponse * spotless * spotless * remove unused hashset * fix merge * fix empty credential case * spotlessApply --------- Co-authored-by: David Lu <dalu@hubspot.com> * main: Update dependency io.smallrye.common:smallrye-common-annotation to v2.12.0 (apache#1355) * Build: Avoid adding duplicated projects for Intelij IDE usage (apache#1333) * main: Update dependency org.junit:junit-bom to v5.12.2 (apache#1354) * main: Update dependency org.apache.commons:commons-text to v1.13.1 (apache#1358) * main: Update dependency boto3 to v1.37.33 (apache#1360) * main: Update dependency software.amazon.awssdk:bom to v2.31.21 (apache#1361) * main: Update dependency io.micrometer:micrometer-bom to v1.14.6 (apache#1362) * main: Update dependency com.google.guava:guava to v33.4.8-jre (apache#1366) * Update LICENSE/NOTICE with latest versions (apache#1364) * Use "clean" LICENSE and NOTICE in published jar artifacts (apache#1292) * main: Update dependency io.projectreactor.netty:reactor-netty-http to v1.2.5 (apache#1372) * Add `Varint` type for variable-length integer encoding (apache#1229) * main: Update docker.io/prom/prometheus Docker tag to v3.3.0 (apache#1375) * Set version to 0.10.0-beta in prepaaration for the next release (apache#1370) * Update the link to OpenAPI in the documentation (apache#1379) * Integration test for Spark Client (apache#1349) * add integration test * add change * add comments * rebase main * update class comments * add base integration * clean up comments * main: Update dependency net.ltgt.gradle:gradle-errorprone-plugin to v4.2.0 (apache#1392) * Add generic table documentations (apache#1374) * add generic table documentation (incomplete) * fix table and spacing * remove documentation in client api since there is no implementation yet * remove spacing * minor fix - proof read * review fix, wording * add generic table documentation (incomplete) * fix table and spacing * remove documentation in client api since there is no implementation yet * remove spacing * minor fix - proof read * review fix, wording * proof read - punctuation fix * change table privilege reference * Unblock test `listNamespacesWithEmptyNamespace` (apache#1289) * Unblock test `listNamespacesWithEmptyNamespace` * Use `containsExactly` to simplify the test * Fix empty namespace behavior * Address comments * Block dropping empty namespace * Improve error messages * Revamp the Quick Start page (apache#1367) * First Draft with AWS * try again * try again * try again * try again * try again * try now * should work * AWS First Draft Complete * ensure file changed * Azure First Draft Complete * Azure First Draft, pt. 2 * Azure Completed * GCP First Draft * GCP Verified * File structure fixed * Remove Trino-specific tutorial * Restructured Quick Start * Addresses minor comments from @eric-maynard * Added reference to Deploying Polaris in Production * Fix MD Link Checker --------- Co-authored-by: Adnan Hemani <adnan.hemani@snowflake.com> * Update README with links to new Quickstart experience (apache#1393) * Update the StorageConfiguration to invoke singleton client objects, a… (apache#1386) * Update the StorageConfiguration to invoke singleton client objects, and add a test * Fix formatting * using guava suppliers * Add aws region * Cleanup and mock test * Spark: Add rest table operations (drop, list, purge and rename etc) for Spark Client (apache#1368) * Initial MVP implementation of Catalog Federation to remote Iceberg REST Catalogs (apache#1305) * Initial prototype of catalog federation just passing special properties into internal properties. Make Resolver federation-aware to properly handle "best-effort" resolution of passthrough facade entities. Targets will automatically reflect the longest-path that we happen to have stored locally and resolve grants against that path (including the degenerate case where the longest-path is just the catalog itself). This provides Catalog-level RBAC for passthrough federation. Sketch out persistence-layer flow for how connection secrets might be pushed down into a secrets-management layer. * Defined internal representation classes for connection config * Construct and initialize federated iceberg catalog based on connection config * Apply the same spec renames to the internal ConnectionConfiguration representations. * Manually pick @XJDKC fixes for integration tests and omittign secrets in response objects * Fix internal connection structs with updated naming from spec PR * Push CreateCatalogRequest down to PolarisAdminService::createCatalog just like UpdateCatalogRequest in updateCatalog. This is needed if we're going to make PolarisAdminService handle secrets management without ever putting the secrets into a CatalogEntity. * Add new interface UserSecretsManager along with a default implementation The default UnsafeInMemorySecretsManager just uses an inmemory ConcurrentHashMap to store secrets, but structurally illustrates the full flow of intended implementations. For mutual protection against a compromise of a secret store or the core persistence store, the default implementation demonstrates storing only an encrypted secret in the secret store, and a one-time-pad key in the returned referencePayload; other implementations using standard crypto protocols may choose to instead only utilize the remote secret store as the encryption keystore while storing the ciphertext in the referencePayload (e.g. using a KMS engine with Vault vs using a KV engine). Additionally, it demonstrates the use of an integrity check by storing a basic hashCode in the referencePayload as well. * Wire in UserSecretsManager to createCatalog and federated Iceberg API handlers Update the internal DPOs corresponding to the various ConnectionConfigInfo API objects to no longer contain any possible fields for inline secrets, instead holding the JSON-serializable UserSecretReference corresponding to external/offloaded secrets. CreateCatalog for federated catalogs containing secrets will now first extract UserSecretReferences from the CreateCatalogRequest, and the CatalogEntity will populate the DPOs corresponding to ConnectionConfigInfos in a secondary pass by pulling out the relevant extracted UserSecretReferences. For federated catalog requests, when reconstituting the actual sensitive secret configs, the UserSecretsManager will be used to obtain the secrets by using the stored UserSecretReferences. Remove vestigial internal properties from earlier prototypes. * Since we already use commons-codec DigestUtils.sha256Hex, use that for the hash in UnsafeInMemorySecretsManager just for consistency and to illustrate a typical scenario using a cryptographic hash. * Rename the persistence-objects corresponding to API model objects with a new naming convention that just takes the API model object name and appends "Dpo" as a suffix; * Use UserSecretsManagerFactory to Produce the UserSecretsManager (apache#1) * Move PolarisAuthenticationParameters to a top-level property according to the latest spec * Create a Factory for UserSecretsManager * Fix a typo in UnsafeInMemorySecretsManagerFactory * Gate all federation logic behind a new FeatureConfiguration - ENABLE_CATALOG_FEDERATION * Also rename some variables and method names to be consistent with prior rename to ConnectionConfigInfoDpo * Change ConnectionType and AuthenticationType to be stored as int codes in persistence objects. Address PR feedback for various nits and javadoc comments. * Add javadoc comment to IcebergCatalogPropertiesProvider * Add some constraints on the expected format of the URN in UserSecretReference and placeholders for next steps where we'd provide a ResolvingUserSecretsManager for example if the runtime ever needs to delegate to two different implementations of UserSecretsManager for different entities. Reduce the `forEntity` argument to just PolarisEntityCore to make it more clear that the implementation is supposed to extract the necessary identifier info from forEntity for backend cleanup and tracking purposes. --------- Co-authored-by: Rulin Xing <rulin.xing+oss@snowflake.com> Co-authored-by: Rulin Xing <xjdkcsq3@gmail.com> * Add Adnan and Neelesh to collaborators list (apache#1396) * Replace authentication filters with Quarkus Security (apache#1373) * Implement PolicyCatalogHandler and Add Policy Privileges Stage 1: CRUD + ListPolicies (apache#1357) * Add PolicyCatalogHandler and tests * Fix style * Address review comments * Address review comments 2 * fix nit * Remove CallContext.getAuthenticatedPrincipal() (apache#1400) * main: Update dependency info.picocli:picocli-codegen to v4.7.7 (apache#1408) * main: Update dependency com.google.errorprone:error_prone_core to v2.38.0 (apache#1404) * Add Polaris Community Meeting 2025-04-17 (apache#1409) * main: Update dependency boto3 to v1.37.37 (apache#1412) * EclipseLink: add PrimaryKey to policy mapping records JPA model (apache#1403) * Re-instate dependencies between Docker Compose services (apache#1407) * Do not rotate bootstrapped root credentials (apache#1414) * Add Getting Started Button to the Apache Polaris Webshite Homepage (apache#1406) * Core: change to return ApplicablePolicies (apache#1415) * Rename the Snapshot Retention policy (apache#1284) * Rename the Snapshot Retention policy * Resolve comments * Resolve comments --------- Co-authored-by: Yufei Gu <yufei.apache.org> * main: Update dependency com.adobe.testing:s3mock-testcontainers to v4.1.0 (apache#1419) * rename snapshotRetention to snashotExpiry (apache#1420) * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1744796716 (apache#1394) * main: Update dependency software.amazon.awssdk:bom to v2.31.26 (apache#1413) * main: Update dependency com.adobe.testing:s3mock-testcontainers to v4.1.1 (apache#1425) * Fix releaseEmailTemplate task (apache#1384) * Update distributions LICENSE and NOTICE with AWS SDK 2.31.26 update (apache#1423) * Support snapshots=refs (apache#1405) * initial commit * autolint * small revert * rebase * autolint * simpler * autolint * tests * autolint * stable * fix leak * ready for review * improved test * autolint * logic flip again * Update service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> * Update integration-tests/src/main/java/org/apache/polaris/service/it/env/CatalogApi.java Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> * adjustments for committed suggestions * autolint --------- Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> * Remove activatedPrincipalRoles property from AuthenticatedPolarisPrincipal (apache#1410) This seems to be a leftover from when ActiveRolesProvider was introduced. The setter was still used, but the getter wasn't, which hints at the fact that this property can be safely removed. As a bonus, AuthenticatedPolarisPrincipal now becomes immutable, which is imho a very good thing. * Implement PolicyCatalogHandler and Add Policy Privileges Stage 2: AttachPolicy + DetachPolicy (apache#1416) * add auth test for attach/detach * apply formatter * refactor authorizePolicyAttachmentOperation * address comment * better naming * Ship eclipselink and PostgreSQL JDBC driver by default in Polaris distribution (apache#1411) * Fix Connection Config DPOs (apache#1422) * Fix connection config dpos * Run spotlessApply * Doc: Fix the issue that html tags are not working in Hugo (apache#1382) * Implement PolicyCatalogHandler Stage 3: GetApplicablePolicies (apache#1421) * [JDBC] Part2: Add Relational JDBC module (apache#1287) * Bump version to 0.11.0-beta-incubating-SNAPSHOT (apache#1429) * Make entity lookups by id honor the specified entity type (apache#1401) * Make entity lookups by id honor the specified entity type All implementations of `TransactionalPersistence.lookupEntityInCurrentTxn()` are currently ignoring the `typeCode` parameter completely and could potentially return an entity of the wrong type. This can become very concerning during authentication, since a principal lookup could return some entity that is not a principal, and that would be considered a successful authentication. * review * Remove "test" Authenticator (apache#1399) * Propagate SQLException as "caused by" (apache#1430) * Remove logging for DbOps (apache#1433) * Spark: Add regtests for Spark client to test built jars (apache#1402) * main: Update dependency com.google.cloud:google-cloud-storage-bom to v2.51.0 (apache#1436) * main: Update dependency org.testcontainers:testcontainers-bom to v1.21.0 (apache#1437) * main: Update actions/setup-python digest to a26af69 (apache#1440) * Spark-IT: use correct configurations (apache#1444) ... do not let Spark leak into Quarkus * PolarisRestCatalogIntegrationTest: Always purge generic tables (apache#1443) * Add missing Postgresql dependency (apache#1447) * Add Request Timeouts (apache#1431) * add timeout * add iceberg exception mapping * dont use quarkus bom, disable timeout * nits * Fix sparks sql regtests with up to date config (apache#1454) * Refactor BasePolarisTableOperations & BasePolarisViewOperations (apache#1426) * initial copy paste * Reorder * view copy paste * fixes, polish * stable * yank * CODE_COPIED_TO_POLARIS comments * autolint * update license * typofix * update comments * autolint * Use .sha512 extension instead of -sha512 (apache#1449) * main: Update dependency org.eclipse.microprofile.fault-tolerance:microprofile-fault-tolerance-api to v4.1.2 (apache#1451) * Doc: Update Local Root Principal Credentials in Quickstart (apache#1452) * Update the Getting Started Workflow with each Cloud Provider's Blob Storage (apache#1435) * AWS First Draft * Debug * revert typo * Add JQ to docker runtime * Debug, pt2 * debug * debug * Allow Instance Profile Roles * change random suffix * change instance profile to regular IAM roles * AWS Final Draft * Azure First Draft * debug * Azure First Draft * debug * typo * GCP First Try * GCP Complete * GCP Final * add all jars to Spark * refactor * Implement PolicyCatalogAdapter (apache#1438) * Generic Table/Policy Store: Move feature config check to Adapter and some small refactoring (apache#1465) * update refs (apache#1464) * [JDBC] Part3: Plumb JDBC module to Quarkus (apache#1371) * Allow BasePolarisTableOperations to skip refreshing metadata after a commit (apache#1456) * initial commit * fix another test * changes per comments * visibility * changes per review * autolint * oops * main: Update dependency com.fasterxml.jackson:jackson-bom to v2.19.0 (apache#1455) * Doc: Added set custom credentials instruction in README (apache#1461) * Doc: Add policy documentation (apache#1460) * main: Update dependency software.amazon.awssdk:bom to v2.31.30 (apache#1475) * main: Update dependency gradle to v8.14 (apache#1459) * main: Update dependency gradle to v8.14 * fix PR --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * Remove unused class TokenInfoExchangeResponse (apache#1479) This is an oversight from apache#1399. * Upgrade Polaris to Iceberg 1.9.0 (apache#1309) * Doc: Update on access-control policy docs (apache#1472) * main: Update Quarkus Platform and Group (apache#1381) * Added link to the Spark-Jupyter Notebook Getting Started from the main Getting Started Page (apache#1453) * Added link to the Spark-Jupyter Notebook Getting Started from the main Quickstart page * Typo Co-authored-by: Eric Maynard <emaynard@apache.org> * Suggestions as per @eric-maynard's review * Fix Typo --------- Co-authored-by: Eric Maynard <emaynard@apache.org> * [JDBC] Support Policy (apache#1468) * Refactor EntityCache into an interface (apache#1193) * Refactor EntityCache to an interface * fix * spotless * Remove unused PolarisCredentialVendor.validateAccessToLocations() (apache#1480) * Remove unused PolarisCredentialVendor.validateAccessToLocations() * review: remove ValidateAccessResult and comments * Policy Store: Check whether Policy is in use before dropping and support `detach-all` flag (apache#1467) * fix error (apache#1492) * Ensure writeToPolicyMappingRecord update existing record if primary key equals in EclipseLink Persistence Impl (apache#1469) * update PolicyMappingRecord if not exists * update test * add TODO * Eliminate getCurrentContext() call in PolarisAuthorizerImpl (apache#1494) * Add getting-started for Polaris Spark Client with Delta tables (apache#1488) * Fix: Pull Postgres image automatically (apache#1495) * Fix Outdated Information and add Information regarding `docker compose down` to Quickstart (apache#1497) * Fix Outdated Information and Add Information regarding docker compose down to Quickstart * Revision 2 * Remove shutdown from README * typo * Upgrade Iceberg REST Spec to match Iceberg 1.8 (apache#1283) * prep for review * reset * more changes * fixes * github action change * another build change * try api revert * re-all * custom type mappings, rebuild * autolint * polish * yank custom types * update * autolint * wip * Revert build changes * example * autolint * Fix FileIOExceptionsTest to conform to new Iceberg 1.8 API (apache#1501) It looks like after apache#1283, this test no longer compiles as the Iceberg API has changed. I'm not sure how this wasn't caught by CI on that PR itself. * JDBC: Optimize writeEntity calls (apache#1496) * Remove transaction from atomic writes * remove if-else * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1745840590 (apache#1499) * Support for external identity providers (apache#1397) * JDBC: create objects without reflection (apache#1434) * Include quarkus-container-image and README in the binary distributions (apache#1493) * Site: Fix Management and Catalog Spec links (apache#1507) * Lazy iteration over JDBC ResultSet (apache#1487) * refactor * autolint * polish * autolint * changes per review * autolint * unwrapping caller * changes per review * Update distributions LICENSE and NOTICE with artifacts and versions sync (apache#1509) * Avoid using deprecated `NestedField.of()` (apache#1514) * Fix compile warning: unknown enum constant Id.NAME (apache#1513) * Doc: Add getting started with JDBC source (apache#1470) * Site: Add Polaris Spark client webpage under unreleased (apache#1503) * fix merge error * retrigger test * Fix test failure (apache#1541) * mitigate .snyk issue * revert file in this pr * add .snyk file * retrigger * move snyk file * retrigger * resolve conflict * retrigger * Revert "resolve conflict" This reverts commit 5d6427150cab67aad7a4eca37142e87316f514fc. * repick the change --------- Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@dremio.com> Co-authored-by: danielhumanmod <danieltu.life@gmail.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: JB Onofré <jbonofre@apache.org> Co-authored-by: Robert Stupp <snazy@snazy.de> Co-authored-by: MonkeyCanCode <yongzheng0809@gmail.com> Co-authored-by: Honah (Jonas) J. <honahx@apache.org> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com> Co-authored-by: Liam Bao <90495036+liamzwbao@users.noreply.github.com> Co-authored-by: Yufei Gu <yufei@apache.org> Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> Co-authored-by: Dennis Huo <7410123+dennishuo@users.noreply.github.com> Co-authored-by: Travis Bowen <122238243+travis-bowen@users.noreply.github.com> Co-authored-by: Travis Bowen <travis.bowen@snowflake.com> Co-authored-by: gh-yzou <167037035+gh-yzou@users.noreply.github.com> Co-authored-by: Mansehaj Singh <msehajs@gmail.com> Co-authored-by: Prashant Singh <35593236+singhpk234@users.noreply.github.com> Co-authored-by: Juichang Lu <wolflex888@gmail.com> Co-authored-by: David Lu <dalu@hubspot.com> Co-authored-by: gfakbar20 <gfakbar20@gmail.com> Co-authored-by: Adnan Hemani <adnan.h@berkeley.edu> Co-authored-by: Adnan Hemani <adnan.hemani@snowflake.com> Co-authored-by: Neelesh Salian <nssalian@users.noreply.github.com> Co-authored-by: Rulin Xing <rulin.xing+oss@snowflake.com> Co-authored-by: Rulin Xing <xjdkcsq3@gmail.com> Co-authored-by: fabio-rizzo-01 <fabio.rizzocascio@jpmorgan.com> Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr> Co-authored-by: Richard Liu <35082658+RichardLiu2001@users.noreply.github.com> Co-authored-by: Michael Collado <40346148+collado-mike@users.noreply.github.com> Co-authored-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com> Co-authored-by: Eric Maynard <emaynard@apache.org> Co-authored-by: Andrew Guterman <andrew.guterman1@gmail.com>
This PR refactors types like
BasePolarisCatalog
andTABLE_LIKE
in order to separate out Iceberg from non-Iceberg concerns in the catalog. Per the design circulated on the dev list, this introduces a new typeGenericTableCatalog
that is the analog ofIcebergCatalog
for non-Iceberg tables. The focus here is on refactoring and implementingGenericTableCatalog
is not the real goal of this PR, so only a couple of toy methods are provided here.