Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] support the path without scheme prefix when invoke gvfs api #2860

Closed
coolderli opened this issue Apr 10, 2024 · 0 comments
Closed
Assignees
Labels
improvement Improvements on everything

Comments

@coolderli
Copy link
Collaborator

What would you like to be improved?

Some engines like Tensorflow may create the GravitinoVirtualFilesystem, and pass the path that does not have the scheme prefix.
In the current implementation, the path has to set the scheme prefix, it will not work with TensorFlow. In the issue, we will improve it.

Tensorflow will parseHadoopPath after creating the HadoopFileSystem(GravitinoVirtualFIleSystem). After that, it will use the path that has no scheme prefix to invoke API.

void ParseHadoopPath(const std::string& fname, std::string* scheme,
                     std::string* namenode, std::string* path) {
  size_t scheme_end = fname.find("://") + 2;
  // We don't want `://` in scheme.
  *scheme = fname.substr(0, scheme_end - 2);
  size_t nn_end = fname.find("/", scheme_end + 1);
  if (nn_end == std::string::npos) {
    *namenode = fname.substr(scheme_end + 1);
    *path = "";
    return;
  }
  *namenode = fname.substr(scheme_end + 1, nn_end - scheme_end - 1);
  // We keep `/` in path.
  *path = fname.substr(nn_end);  // here, truncate the scheme prefix
}
 ParseHadoopPath(path, &scheme, &namenode, &hdfs_path);

  auto handle = libhdfs->hdfsOpenFile(fs, hdfs_path.c_str(), O_WRONLY, 0, 0, 0);   // here. the hdfs_path has no gvfs:// prefix

How should we improve?

improve the compatibility to support the path with scheme prefix and the path without scheme prefix

@coolderli coolderli added the improvement Improvements on everything label Apr 10, 2024
@jerryshao jerryshao added this to the Gravitino 0.5.0 milestone Apr 15, 2024
jerryshao pushed a commit that referenced this issue Apr 15, 2024
…2779)

### What changes were proposed in this pull request?

It will use the path without scheme in tensorflow. This MR will support
the path without gvfs scheme.

https://github.com/tensorflow/io/blob/master/tensorflow_io/core/filesystems/hdfs/hadoop_filesystem.cc#L618

https://github.com/tensorflow/io/blob/master/tensorflow_io/core/filesystems/hdfs/hadoop_filesystem.cc#L116

### Why are the changes needed?

-  support path without Scheme for Hadoop API
- #2860

### Does this PR introduce _any_ user-facing change?

- no

### How was this patch tested?

- UTs pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements on everything
Projects
None yet
Development

No branches or pull requests

2 participants