Skip to content

[Policy Store | Management Spec] Add policy privileges to spec and update admin service impl #1529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 20, 2025

Conversation

HonahX
Copy link
Contributor

@HonahX HonahX commented May 6, 2025

This PR adds new policy related privileges to polaris-management-api.yml and update PolarisAdminService to allow granting new privileges

dev list discussion: https://lists.apache.org/thread/tytrj8lts0zx7xl38o9pmjwwt9kyzy61

eric-maynard
eric-maynard previously approved these changes May 14, 2025
Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! At some point, we should update the relevant docs

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 14, 2025
Comment on lines 1431 to 1436
minLength: 1
maxLength: 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting we have this limits here, not only new entity like Policy, but also for table/view, etc. This becomes a an extra Polaris limit. Should we lift it somehow? I could see a corner case that users migrate a table with a long name will fail. Not a blocker for this PR though. cc @eric-maynard @dennishuo @collado-mike for more context.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I follow the existing pattern to include these limits for the new schema. May be we can do this in a follow-up to remove all the limits if they are not needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, we can deal with it in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HonahX shall we also contain a pattern definition here? for example

pattern: '^(?!\s*[s|S][y|Y][s|S][t|T][e|E][m|M]\$).*$'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I checked the yaml and seems the pattern is defined for catalogRole, principal, catalogName. For namespace and table there is pattern

    NamespaceGrant:
      allOf:
        - $ref: '#/components/schemas/GrantResource'
        - type: object
          properties:
            namespace:
              type: array
              items:
                type: string
            privilege:
              $ref: '#/components/schemas/NamespacePrivilege'
          required:
            - namespace
            - privilege

    TableGrant:
      allOf:
        - $ref: '#/components/schemas/GrantResource'
        - type: object
          properties:
            namespace:
              type: array
              items:
                type: string
            tableName:
              type: string
              minLength: 1
              maxLength: 256
            privilege:
              $ref: '#/components/schemas/TablePrivilege'
          required:
            - namespace
            - tableName
            - privilege

I think this pattern is mainly to prevent granting privileges to system defined roles/principals. If we have similar use case for namespace, table, policy, we can add that in the future. WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not quite sure if i got this part, I double checked the policyName definition, which is the following

      type: string
      pattern: '^[A-Za-z0-9\-_]+$'

there is no min and max limitation, will we run into the problem that when we create the policy, the name might exceed the limitation, but when we grant the the policy, user see the error. Should all those limitation be defined on the policyName, and here we just refer to the policy name definition, similar as privilege?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's weird that we have these length limitations on tableNames, namespaceName, etc. Let me remove that for policy in this PR, since even if we need the limitation, we need that to appear in policy-api.yaml too. Let's deal with other existing length limitations in a follow-up PR/dev discussion.

I do not think we necessarily need pattern restriction on policy name here as management APIs does not create new policies, any invalid names would just result in an not found error.

flyrain
flyrain previously approved these changes May 14, 2025
Comment on lines +1359 to +1370
- POLICY_CREATE
- POLICY_WRITE
- POLICY_READ
- POLICY_DROP
- POLICY_LIST
- POLICY_FULL_METADATA
- CATALOG_ATTACH_POLICY
- CATALOG_DETACH_POLICY
Copy link
Contributor

@singhpk234 singhpk234 May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not-blocker] is it possible to add some comments on what each of them mean ?

Copy link
Contributor

@eric-maynard eric-maynard May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 but we should probably do this separately, since the existing ones need comments too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on doing this in a follow-up. Also do you think if a pointer to the site "Access Control" section would be an option to consider, so that we will have a single source of truth for the privileges' meaning.

PolicyPrivilege:
type: string
enum:
- CATALOG_MANAGE_ACCESS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, why do we need CATALOG_MANAGE_ACCESS for PolicyPrivilege?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this shouldn't be here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this essentially just function is a super-privilege? I see it only TablePrivileges too, for example, even though it's not in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added it here to be consistent with TablePrivileges, ViewPrivileges, and NamespacePrivileges. My understanding of the usage here would be allowing privilege management on a specific policy entity.

- POLICY_DROP
- POLICY_WRITE
- POLICY_LIST
- POLICY_FULL_METADATA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am always confused by the FULL_METADATA, but policy doesn't have a metadata concept, that simply refers to full access capability, right? would POLICY_FULL_ACCESS make more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the same as for namespaces? We call that NAMESPACE_FULL_METADATA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is not an easy task. For me both _FULL_METADATA and _FULL_ACCESS makes sense, but since in polaris we use _FULL_METADATA frequently: CATALOG_FULL_METADATA, NAMESPACE_FULL_METADATA, ... I think POLICY_FULL_METADATA can be more consistent with Polaris' naming convention. WDYT?

Comment on lines 1431 to 1436
minLength: 1
maxLength: 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HonahX shall we also contain a pattern definition here? for example

pattern: '^(?!\s*[s|S][y|Y][s|S][t|T][e|E][m|M]\$).*$'

new ResolverPath(List.of(catalogRoleName), PolarisEntityType.CATALOG_ROLE),
catalogRoleName);
ResolverStatus status = resolutionManifest.resolveAll();
if (status.getStatus() == ResolverStatus.StatusEnum.ENTITY_COULD_NOT_BE_RESOLVED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is taht implementation mostly the same as authorizeGrantOnNamespaceOperationOrThrow, how about let's merge thoe two functions, let the function take a EntityType, and an entity name. The only thing you need to take care here is probably the error message, and i think we can do a simple enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly like authorizeGrantOnTableLikeOperationOrThrow, but it deals with a PolicyIdentifier instead of a TableIdentifier.

Copy link
Contributor

@gh-yzou gh-yzou May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i am actually referring to the namespaces, but all those function seems very similar. I think we can either internally handle this part according to the type, or just let the caller pass this argument List entityNames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I agree there are lots of chances here to reduce code duplication. Since the refactor may impact many methods, how about we doing that in a follow-up? I've opened a draft to attempt a merging for catalog/namespace/table-like: #1621. We can add refactor policy ones after this PR gets in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg!

throw new NoSuchPolicyException(String.format("Policy not exists: %s", identifier));
}

List<PolarisEntity> catalogPath = resolvedPathWrapper.getRawParentPath();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as above, i think we can have a common function that is used for both NameSpace and Policy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Similar as above, how about we do all the refactoring as a whole in #1621 once this is in.

@HonahX HonahX dismissed stale reviews from flyrain and eric-maynard via 8a10d05 May 19, 2025 19:49
@HonahX HonahX force-pushed the honahx_add_policy_privileges_and_grant branch from 0cd21a2 to 8a10d05 Compare May 19, 2025 19:49
@HonahX HonahX merged commit 5c3f0c2 into apache:main May 20, 2025
6 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board May 20, 2025
snazy added a commit to snazy/polaris that referenced this pull request May 22, 2025
* main: Update docker.io/prom/prometheus Docker tag to v3.4.0 (apache#1602)

* Site: Update production configuration page (apache#1606)

* main: Update dependency com.google.cloud:google-cloud-storage-bom to v2.52.3 (apache#1623)

* main: Update dependency boto3 to v1.38.19 (apache#1622)

* Remove Bouncy Castle dependency usage from PemUtils (apache#1318)

- Added PEM format parsing in PemUtils
- Added unit test for PemUtils for empty file and multiple PEM objects
- Removed Bouncy Castle Provider dependency from service common module
- Removed Bouncy Castle Provider dependency from quarkus service module

* Site: Add a page for policy management (apache#1600)

* [Policy Store | Management Spec] Add policy privileges to spec and update admin service impl (apache#1529)

This PR adds new policy related privileges to polaris-management-api.yml and update PolarisAdminService to allow granting new privileges

* Spec: Add SigV4 Auth Support for Catalog Federation (apache#1506)

* Spec changes for SigV4 Auth Support for Catalog Federation

* Extract service identity info as a nested object

* nit: fix admin tool log level and comments (apache#1626)

The previous WARNING log levels seems to work, but WARN
aligns better with standard Quarkus log levels.

Fixes apache#1612

* Doc: switch to use iceberg-aws-bundle jar (apache#1609)

* main: Update dependency org.mockito:mockito-core to v5.18.0 (apache#1630)

* main: Update dependency boto3 to v1.38.20 (apache#1631)

* Require explicit user-consent to enable HadoopFileIO (apache#1532)

Using `HadoopFileIO` in Polaris can enable "hidden features" that users are likely not aware of. This change requires users to manually update the configuration to be able to use `HadoopFileIO` in way that highlights the consequences of enabling it.

This PR updates Polaris in multiple ways:
* The default of `SUPPORTED_CATALOG_STORAGE_TYPES` is changed to not include the `FILE` storage type.
* Respect the `ALLOW_SPECIFYING_FILE_IO_IMPL` configuration on namespaces, tables and views to prevent setting an `io-impl` value for anything but one of the configured, supported storage-types.
* Unify validation code in a new class `IcebergPropertiesValidation`.
* Using `FILE` or `HadoopFileIO` now _also_ requires the explicit configuration `ALLOW_INSECURE_STORAGE_TYPES_ACCEPTING_SECURITY_RISKS=true`.
* Added production readiness checks that trigger when `ALLOW_INSECURE_STORAGE_TYPES_ACCEPTING_SECURITY_RISKS` is `true` or `SUPPORTED_CATALOG_STORAGE_TYPES` contains `FILE` (defaults and per-realm).
* The two new readiness checks are considered _severe_. Severe readiness-errors prevent the server from starting up - unless the user explicitly configured `polaris.readiness.ignore-security-issues=true`.

Log messages and configuration options explicitly use "clear" phrases highlighting the consequences.

With these changes it is intentionally extremely difficult to start Polaris with HadoopFileIO. People who work around all these safety nets must have realized that what they are doing.

A lot of the test code relies on `FILE`/`HadoopFileIO`, those tests got all the configurations to let those tests continue to work as they are, bypassing the added security safeguards.

---------

Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@dremio.com>

---------

Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Yufei Gu <yufei@apache.org>
Co-authored-by: David Handermann <exceptionfactory@apache.org>
Co-authored-by: Honah (Jonas) J. <honahx@apache.org>
Co-authored-by: Rulin Xing <xjdkcsq3@gmail.com>
Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@dremio.com>
Co-authored-by: MonkeyCanCode <yongzheng0809@gmail.com>
adnanhemani pushed a commit to adnanhemani/polaris that referenced this pull request May 22, 2025
…date admin service impl (apache#1529)

This PR adds new policy related privileges to polaris-management-api.yml and update PolarisAdminService to allow granting new privileges
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants