Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 1, 2025

Problem

When configuring an Iceberg REST catalog (e.g., Polaris) with OAuth2 client credentials (clientId + clientSecret), the connection test and metadata ingestion pipeline failed with a 401 Unauthorized error:

pyiceberg.exceptions.UnauthorizedError: RESTError 401: Could not decode json payload: 
Credentials are required to access this resource.

The issue only occurred when using OAuth2 credentials. Bearer token authentication worked correctly, but tokens are short-lived and require manual refresh.

Root Cause

The code in rest.py was always including both credential and token keys in the parameters dictionary passed to PyIceberg's load_rest() function, even when their values were None:

parameters = {
    "warehouse": catalog.warehouseLocation,
    "uri": str(catalog.connection.uri),
    "credential": credential,  # Could be None
    "token": catalog.connection.token.get_secret_value()
        if catalog.connection.token
        else None,  # Could be None
}

When a user provided OAuth2 credentials but no bearer token, PyIceberg received:

{"credential": "clientId:clientSecret", "token": None}

PyIceberg's authentication logic prioritizes the token parameter over credential, so even though credential had a valid value, PyIceberg used token=None, resulting in no authentication being sent and a 401 error from the REST catalog.

Solution

Modified the parameter building logic to only include credential or token keys when they actually have values:

parameters = {
    "warehouse": catalog.warehouseLocation,
    "uri": str(catalog.connection.uri),
}

# Only include credential if it's provided (OAuth2 client credentials)
if credential:
    parameters["credential"] = credential

# Only include token if it's provided (Bearer token auth)
if catalog.connection.token:
    parameters["token"] = catalog.connection.token.get_secret_value()

Now PyIceberg receives clean parameters with only the authentication method that was configured, allowing OAuth2 credentials to work correctly.

Changes

  • Modified ingestion/src/metadata/ingestion/source/database/iceberg/catalog/rest.py (14 lines)
  • Added comprehensive unit tests in ingestion/tests/unit/source/database/iceberg/test_rest_catalog.py
    • Test OAuth2 credentials without token (the reported issue)
    • Test bearer token without OAuth2 credentials
    • Test no authentication
    • Test partial credentials

Impact

What's Fixed:

  • OAuth2 client credentials now work with Polaris and other REST catalogs
  • Automatic token refresh through PyIceberg's OAuth2 handling
  • Users no longer need to manually manage short-lived bearer tokens

Backward Compatibility:

  • Bearer token authentication continues to work
  • No authentication continues to work
  • Other catalog types (Hive, Glue, DynamoDB) are unaffected
  • No changes to schemas, UI, or dependencies

Testing

All test scenarios validated:

  • OAuth2 credentials (clientId + clientSecret) → ✅ Works
  • Bearer token only → ✅ Works
  • No authentication → ✅ Works
  • Partial OAuth2 credentials → ✅ Properly handled

Risk Level: LOW - Minimal code change with comprehensive test coverage and no breaking changes.

Original prompt

This section details on the original issue you should resolve

<issue_title>Issue with Iceberg metadata ingestion configuration</issue_title>
<issue_description>From ui go to Database Services -> Add new service -> Select Iceberg and click next to provide connection details.
Choose "RestCatalogConnection" and provide details. I have used polaris rest catalog and provided the polaris uri, clientid and clientsecret. When i test the connection it fails. I also tried saving it and then triggering pipeline then also it fails with below error.
Provided screenshot of the request which has the credentials set properly.

Note: It works if "token" is provided. But the token is short lived and it does not refresh automatically after it expires

Screenshot 2024-09-11 at 12 03 11 PM

[2024-09-11T06:26:22.676+0000] {server_mixin.py:75} INFO - OpenMetadata client running with Server version [1.4.4] and Client version [1.4.4.1]
[2024-09-11T06:26:22.856+0000] {taskinstance.py:1937} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/rest.py", line 285, in _fetch_config
response.raise_for_status()
File "/home/airflow/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://polaris.polaris.svc.cluster.local:8181/api/catalog/v1/config?warehouse=polaris10k

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 192, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 209, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/openmetadata_managed_apis/workflows/ingestion/common.py", line 207, in metadata_ingestion_workflow
workflow = MetadataWorkflow.create(config)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 103, in create
return cls(config)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 79, in init
super().init(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/base.py", line 107, in init
self.post_init()
return cls(config)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 79, in init
super().init(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/base.py", line 107, in init
self.post_init()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 98, in post_init
self.set_steps()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/metadata.py", line 35, in set_steps
self.source = self._get_source()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/metadata.py", line 66, in _get_source
source: Source = source_class.create(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/metadata.py", line 94, in create
return cls(config, metadata)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/metadata.py", line 79, in init
self.iceberg = get_connection(self.service_connection)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/connections.py", line 49, in get_connection
return get_connection_fn(connection)(connection)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/connection.py", line 35, in get_connection
return IcebergCatalogFactory.from_connection(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/catalog/init.py", line 65, in from_connection
return catalog_type.get_catalog(catalog)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/catalog/rest.py", line 84, in get_catalog
return load_rest(catalog.name, parameters)
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/init.py", line 88, in load_rest
return RestCatalog(name, **conf)
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/rest.py", line 199, in init
self._fetch_config()
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/rest.py", line 287, in _fetch_config
self._handle...

Fixes #17797

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Modified rest.py to only include 'token' or 'credential' keys when they are actually provided
- Previously, both keys were always included (with None values), causing PyIceberg to prioritize token=None over credential
- Now OAuth2 credentials (clientId:clientSecret) work correctly without token interference
- Added comprehensive unit tests for OAuth2, bearer token, and no-auth scenarios

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Issue with Iceberg metadata ingestion configuration Fix Iceberg REST catalog OAuth2 credential authentication Oct 1, 2025
@Copilot Copilot AI requested a review from harshach October 1, 2025 17:32
Copilot finished work on behalf of harshach October 1, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with Iceberg metadata ingestion configuration
2 participants