[common] Refactor CatalogContext to separate Hadoop dependencies #6653
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This PR refactors CatalogContext to separate Hadoop dependencies, enabling Paimon to work in Hadoop-free environments.
Closes #6654
Background and Motivation
Trino Plugin Development Requirement
This change is essential for developing the Trino-Paimon connector. Trino explicitly does not allow connectors to have mandatory Hadoop dependencies:
The previous
paimon-trinoimplementation was affected by this issue, causing deployment problems in Trino environments where Hadoop is not available or desired.Quote from Trino Policy (trinodb/trino#15921):
Problem Statement
Currently,
CatalogContexthas a hard dependency on Hadoop Configuration, causingNoClassDefFoundErrorin environments where Hadoop is not needed:Use Cases Affected
1. 🎯 Trino-Paimon Connector (Primary Use Case)
2. Windows Development Environment
When using Flink CDC with Paimon sink to MinIO S3 on Windows, the application fails with:
3. Lightweight Deployments
Solution
Refactor CatalogContext using class hierarchy to separate concerns:
New Architecture
CatalogContext- Base class without Hadoop dependencyHadoopAware- Interface for Hadoop functionalityCatalogHadoopContext- Hadoop implementationCatalogContextand implementsHadoopAwareArchitecture Changes
Before (Hard Dependency):
After (Optional Dependency):
Factory Pattern
Factory methods automatically detect whether Hadoop Configuration is needed and return the appropriate type:
Updated Components
FileIO implementations: Use
CatalogContextinstead of requiring HadoopLocalFileIO: Works without Hadoop ✅HadoopFileIO: Checks forHadoopAwareinterface dynamicallyResolvingFileIO: Supports both modesSecurityContext: Gracefully handles absence of Hadoop
Catalog factories: Updated to support both contexts
Changes Summary
Core Changes (paimon-common):
CatalogContext.java- Refactored to base class without HadoopCatalogHadoopContext.java(169 lines) - Hadoop-aware extensionHadoopAware.java(45 lines) - Interface for Hadoop functionalityFileIO Updates:
FileIOUtils.java: Handle both CatalogContext and HadoopAwareResolvingFileIO.java: Support Hadoop-free initializationHadoopFileIO.java: Check for HadoopAware dynamicallyLocalFileIO.java: Use CatalogContext onlyIntegration Updates (paimon-core, paimon-hive, paimon-spark):
Total: 26 files changed, 571 insertions(+), 80 deletions(-)
Benefits
Testing
Affected Modules
Compatibility
This is a backward-compatible change:
CatalogContextcontinues to workRelated Issues