-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve(filesystem-hadoop): support path without scheme for gvfs api #2779
improve(filesystem-hadoop): support path without scheme for gvfs api #2779
Conversation
254debc
to
e66a1d2
Compare
@jerryshao @xloya Please help review this MR when you have time. Thanks. |
@@ -51,6 +53,9 @@ public class GravitinoVirtualFileSystem extends FileSystem { | |||
private Cache<NameIdentifier, Pair<Fileset, FileSystem>> filesetCache; | |||
private ScheduledThreadPoolExecutor scheduler; | |||
|
|||
private static final Pattern IDENTIFIER_PATTERN = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a description for this pattern which string will be matched?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Overall LGTM, left a minor suggestion. |
Is it the limitation of tensorflow? If so, how does tensorflow distinguish different filesystems? |
String virtualPath = virtualUri.toString(); | ||
if (StringUtils.isBlank(virtualPath)) { | ||
throw new InvalidPathException( | ||
virtualPath, "Uri which need be extracted cannot be null or empty."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use Preconditions
for simplicity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Seems you don't create a related issue, can you please create one and associate with this PR? |
55efc01
to
99a8c66
Compare
create issue #2860 |
Yes. I added a description of the issue. Tensorflow will parseHadoopPath after creating the HadoopFileSystem(GravitinoVirtualFIleSystem). After that, it will use the path that has no scheme prefix to invoke API.
|
@jerryshao @xloya I have fixed. Please take a look. Thanks. |
@jerryshao Github CI is passed. Please help review this when you have time. Thanks. |
@@ -51,6 +52,13 @@ public class GravitinoVirtualFileSystem extends FileSystem { | |||
private Cache<NameIdentifier, Pair<Fileset, FileSystem>> filesetCache; | |||
private ScheduledThreadPoolExecutor scheduler; | |||
|
|||
// The pattern is used to match gvfs path. The scheme prefix (gvfs://) is optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheme prefix should be "gvfs://fileset", not "gvfs://", right? If so, can you please clarify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryshao You are right. I have fixed it. Please take a look. Thanks.
Just one minor issue, @coolderli can you please clarify it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
What changes were proposed in this pull request?
It will use the path without scheme in tensorflow. This MR will support the path without gvfs scheme.
https://github.com/tensorflow/io/blob/master/tensorflow_io/core/filesystems/hdfs/hadoop_filesystem.cc#L618
https://github.com/tensorflow/io/blob/master/tensorflow_io/core/filesystems/hdfs/hadoop_filesystem.cc#L116
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?