Fix Iceberg REST catalog OAuth2 credential authentication #23674
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When configuring an Iceberg REST catalog (e.g., Polaris) with OAuth2 client credentials (clientId + clientSecret), the connection test and metadata ingestion pipeline failed with a 401 Unauthorized error:
The issue only occurred when using OAuth2 credentials. Bearer token authentication worked correctly, but tokens are short-lived and require manual refresh.
Root Cause
The code in
rest.py
was always including bothcredential
andtoken
keys in the parameters dictionary passed to PyIceberg'sload_rest()
function, even when their values wereNone
:When a user provided OAuth2 credentials but no bearer token, PyIceberg received:
PyIceberg's authentication logic prioritizes the
token
parameter overcredential
, so even thoughcredential
had a valid value, PyIceberg usedtoken=None
, resulting in no authentication being sent and a 401 error from the REST catalog.Solution
Modified the parameter building logic to only include
credential
ortoken
keys when they actually have values:Now PyIceberg receives clean parameters with only the authentication method that was configured, allowing OAuth2 credentials to work correctly.
Changes
ingestion/src/metadata/ingestion/source/database/iceberg/catalog/rest.py
(14 lines)ingestion/tests/unit/source/database/iceberg/test_rest_catalog.py
Impact
✅ What's Fixed:
✅ Backward Compatibility:
Testing
All test scenarios validated:
Risk Level: LOW - Minimal code change with comprehensive test coverage and no breaking changes.
Original prompt
This section details on the original issue you should resolve
<issue_title>Issue with Iceberg metadata ingestion configuration</issue_title>
<issue_description>From ui go to Database Services -> Add new service -> Select Iceberg and click next to provide connection details.
Choose "RestCatalogConnection" and provide details. I have used polaris rest catalog and provided the polaris uri, clientid and clientsecret. When i test the connection it fails. I also tried saving it and then triggering pipeline then also it fails with below error.
Provided screenshot of the request which has the credentials set properly.
Note: It works if "token" is provided. But the token is short lived and it does not refresh automatically after it expires
[2024-09-11T06:26:22.676+0000] {server_mixin.py:75} INFO - OpenMetadata client running with Server version [1.4.4] and Client version [1.4.4.1]
[2024-09-11T06:26:22.856+0000] {taskinstance.py:1937} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/rest.py", line 285, in _fetch_config
response.raise_for_status()
File "/home/airflow/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://polaris.polaris.svc.cluster.local:8181/api/catalog/v1/config?warehouse=polaris10k
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 192, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 209, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/openmetadata_managed_apis/workflows/ingestion/common.py", line 207, in metadata_ingestion_workflow
workflow = MetadataWorkflow.create(config)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 103, in create
return cls(config)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 79, in init
super().init(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/base.py", line 107, in init
self.post_init()
return cls(config)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 79, in init
super().init(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/base.py", line 107, in init
self.post_init()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/ingestion.py", line 98, in post_init
self.set_steps()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/metadata.py", line 35, in set_steps
self.source = self._get_source()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/workflow/metadata.py", line 66, in _get_source
source: Source = source_class.create(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/metadata.py", line 94, in create
return cls(config, metadata)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/metadata.py", line 79, in init
self.iceberg = get_connection(self.service_connection)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/connections.py", line 49, in get_connection
return get_connection_fn(connection)(connection)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/connection.py", line 35, in get_connection
return IcebergCatalogFactory.from_connection(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/catalog/init.py", line 65, in from_connection
return catalog_type.get_catalog(catalog)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/catalog/rest.py", line 84, in get_catalog
return load_rest(catalog.name, parameters)
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/init.py", line 88, in load_rest
return RestCatalog(name, **conf)
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/rest.py", line 199, in init
self._fetch_config()
File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/rest.py", line 287, in _fetch_config
self._handle...
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.