-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5220] improvment(hadoop-catalog): Optimize the name properties keys for Hadoop catalog. #5244
Conversation
catalogs/catalog-common/src/main/java/org/apache/gravitino/storage/S3Properties.java
Show resolved
Hide resolved
...adoop/src/main/java/org/apache/gravitino/catalog/hadoop/config/HadoopS3FileSystemConfig.java
Outdated
Show resolved
Hide resolved
catalogs/catalog-common/src/main/java/org/apache/gravitino/storage/GCSProperties.java
Outdated
Show resolved
Hide resolved
catalogs/catalog-common/src/main/java/org/apache/gravitino/storage/S3Properties.java
Show resolved
Hide resolved
...atalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java
Outdated
Show resolved
Hide resolved
...doop/src/main/java/org/apache/gravitino/catalog/hadoop/config/HadoopGCSFileSystemConfig.java
Outdated
Show resolved
Hide resolved
...doop/src/main/java/org/apache/gravitino/catalog/hadoop/config/HadoopGCSFileSystemConfig.java
Outdated
Show resolved
Hide resolved
...doop/src/main/java/org/apache/gravitino/catalog/hadoop/config/HadoopGCSFileSystemConfig.java
Outdated
Show resolved
Hide resolved
@@ -95,10 +96,8 @@ protected String defaultBaseLocation() { | |||
|
|||
protected void createCatalog() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these ITs to each bundle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can try it but I will not do it in this PR, It will make this PR large and hard to test the changes introduced by this PR, so I created #5325 to follow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should think if user implement a custom FS with FS provider, can he integrate into Gravitino server Hadoop catalog, Java and Python gvfs without code change in Gravitino? Take this as a principle, if not, please make it work.
bundles/aliyun-bundle/src/main/java/org/apache/gravitino/oss/fs/OSSFileSystemProvider.java
Outdated
Show resolved
Hide resolved
|
||
private static final String OSS_FILESYSTEM_IMPL = "fs.oss.impl"; | ||
|
||
public static final Map<String, String> GRAVITINO_KEY_TO_OSS_HADOOP_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to make this public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, why do we turn it back to public
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value will be used in GravitinoVirtualFileSystemOSSIT test and others are similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because you don't put IT and code together, so you have to make it public, can you please add a visible for test tag on it?
bundles/gcp-bundle/src/main/java/org/apache/gravitino/gcs/fs/GCSFileSystemProvider.java
Outdated
Show resolved
Hide resolved
@@ -32,6 +32,8 @@ | |||
*/ | |||
public interface FileSystemProvider { | |||
|
|||
String GRAVITINO_BYPASS = "gravitino.bypass."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some comments to this variable, to let users know how to leverage this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -107,5 +112,46 @@ public class GravitinoVirtualFileSystemConfiguration { | |||
public static final long FS_GRAVITINO_FILESET_CACHE_EVICTION_MILLS_AFTER_ACCESS_DEFAULT = | |||
1000L * 60 * 60; | |||
|
|||
private static final Map<String, String> GVFS_S3_KEY_TO_HADOOP_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add a new FS provider, do we need to update the GVFS code to make it work? If so, I think it is not a good design, we should also hide the client side configuration into each provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long run, we can provide a method like requiredPropertyKey
in the FileSystemProvider
and load this provider by service provider. By doing so, we can register this property to the client dynamically. Currently, we cannot avoid this hard coding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still suggest to move this to each provider, we don't have to expose them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this will make module hadoop3-filesystem
depend on bundle jars in the runtime explicitly will hinder me from doing so. If this is not a big problem, I would like to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is a dependency? If user sets the configurations via Hadoop Configuration
in gvfs, when gvfs create a provider, it can pass the configs from gvfs' Configuration
to provider's config map. Each provider can use this config map to check the configurations and initialize the FS.
In most cases, the current implementation can satisfy users' needs. They can use the For example, if we use the following key/value pair as the properties when creating a Hadoop catalog,
Then, we will pass these configurations on to the file system provider. It's quite flexible now and it depends on the users that custom a new file system provider what a property key stands for. key
The following three keys are identical and users can use any key that contains endpoint information to the file system provider and then assign it to the key
I'm not very sure whether the kinds of property conversion are acceptable to you. If we need to specify each required property explicitly, like the current S3 implementation, users need to use |
config.forEach( | ||
(k, v) -> { | ||
if (predefinedKeys.containsKey(k)) { | ||
String key = predefinedKeys.get(k); | ||
highestPriorityKeys.add(key); | ||
result.put(key, v); | ||
} | ||
|
||
if (!k.startsWith(GRAVITINO_BYPASS)) { | ||
// If the key does not start with 'gravitino.bypass' and is not in the highest priority | ||
// keys, set the value to the configuration. | ||
if (!highestPriorityKeys.contains(k)) { | ||
result.put(k, v); | ||
} | ||
} else { | ||
// If the key starts with 'gravitino.bypass', remove the prefix and set the value to the | ||
// configuration if the key does not exist in the configuration. | ||
String key = k.replace(GRAVITINO_BYPASS, ""); | ||
if (!result.containsKey(key)) { | ||
result.put(key, v); | ||
} | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify this code? It's a bit hard to understand, I think it is OK to iterate for several rounds, just make it easy to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a UT for this?
@@ -32,6 +32,8 @@ | |||
*/ | |||
public interface FileSystemProvider { | |||
|
|||
String GRAVITINO_BYPASS = "gravitino.bypass."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -107,4 +112,64 @@ protected void createCatalog() { | |||
protected String generateLocation(String filesetName) { | |||
return String.format("%s/%s", defaultBaseLocation, filesetName); | |||
} | |||
|
|||
@Test | |||
public void testCreateSchemaAndFilesetWithSpecialLocation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to fix the bug introduced by #5296, we need to consider other Hadoop catalog like S3, OSS and GCS.
@@ -107,5 +112,46 @@ public class GravitinoVirtualFileSystemConfiguration { | |||
public static final long FS_GRAVITINO_FILESET_CACHE_EVICTION_MILLS_AFTER_ACCESS_DEFAULT = | |||
1000L * 60 * 60; | |||
|
|||
private static final Map<String, String> GVFS_S3_KEY_TO_HADOOP_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long run, we can provide a method like requiredPropertyKey
in the FileSystemProvider
and load this provider by service provider. By doing so, we can register this property to the client dynamically. Currently, we cannot avoid this hard coding.
FileSystemProvider provider = fileSystemProvidersMap.get(scheme); | ||
if (provider == null) { | ||
throw new GravitinoRuntimeException( | ||
"Unsupported file system scheme: %s for %s.", | ||
scheme, GravitinoVirtualFileSystemConfiguration.GVFS_SCHEME); | ||
} | ||
|
||
Map<String, String> maps = getConfigMap(getConf()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the gvfs configuration for now, is it aligned with server side conf, or still use "."? I cannot find the code anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the same as those in Gravitino server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you verified manually using gvfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , GravitinoVirtualFileSystemS3IT
can be running in the CI and cover the changes.
…rovider and add more comments.
Overall LGTM, please fix the ci issue. |
… for Hadoop catalog. (#5244) ### What changes were proposed in this pull request? Replace the properties keys with `gravitino.bypass` prefix with a more elegant one. ### Why are the changes needed? For better user experience. Fix: #5220 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Existing UTs and ITs.
…s keys for Hadoop catalog. (apache#5244) ### What changes were proposed in this pull request? Replace the properties keys with `gravitino.bypass` prefix with a more elegant one. ### Why are the changes needed? For better user experience. Fix: apache#5220 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Existing UTs and ITs.
What changes were proposed in this pull request?
Replace the properties keys with
gravitino.bypass
prefix with a more elegant one.Why are the changes needed?
For better user experience.
Fix: #5220
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Existing UTs and ITs.