-
Notifications
You must be signed in to change notification settings - Fork 301
Initial commit of hadoop file system (credit vnvo2409) #1197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
|
||
typedef struct HadoopFileSystem { | ||
absl::Mutex mu; | ||
std::unique_ptr<HadoopFileSystemImplementation> ptr ABSL_GUARDED_BY(mu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test failed because you forgot to add #include<memory>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vnvo2409 Thanks. The PR has been updated.
static void* plugin_memory_allocate(size_t size) { return calloc(1, size); } | ||
static void plugin_memory_free(void* ptr) { free(ptr); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions are unused. Likely because I copied from posix
. Let just merge this PR and I will make a follow up PR ( mostly cleanup and better memory management ).
#if defined(_MSC_VER) | ||
TF_SetStatus(status, TF_UNIMPLEMENTED, "LoadSharedLibrary not implemented"); | ||
return nullptr; | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to use TF_LoadSharedLibrary
as we did in experiemental plugin. And does it mean that we drop support for hdfs
on Windows ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vnvo2409 Inside tensorflow while the functions are defined, they somehow are not part of the pip package so we cannot include them directly. I tried to modify the tensorflow repo but had some trouble due to the big circular dependency. Give we only need those two functions and they are small, I think it makes sense to just include them here.
Also, I have added the Windows counterpart as well so now hdfs should be supported on Windows.
#if defined(_WIN32) | ||
if (home.back() != '\\') home.push_back('\\'); | ||
return home + "lib\\native\\" + lib; | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop support for hdfs
on Windows, I think this isn't necessary anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vnvo2409 See my other comments. The Windows part has been added.
Thank you for this PR @yongtang ! In addition, I am thinking of a loading dynamically mechanics for the plugins. Maybe we could change Status Env::GetFileSystemForFile(const std::string& fname,
FileSystem** result) {
StringPiece scheme, host, path;
io::ParseURI(fname, &scheme, &host, &path);
FileSystem* file_system = file_system_registry_->Lookup(std::string(scheme));
if (!file_system) {
if (scheme.empty()) {
scheme = "[local]";
}
return errors::Unimplemented("File system scheme '", scheme,
"' not implemented (file: '", fname, "')");
}
*result = file_system;
return Status::OK();
}
|
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Thanks @vnvo2409 for the review. The PR has been updated now to address the review feedback. |
LGTM. Thank you 😄 |
* Initial commit of hadoop file system Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update namespace to avoid collision/conflict Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add a wrapper to HadoopFileSystem so that it will only be lazily loaded. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Address review comments Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This PR is an initial commit of Hadoop modular file system that was moved from tensorflow to here. Credit goes to author @vnvo2409.
This PR consists of mostly mechanic changes, though there are some small change to create a wrapper so that the loading of libhdfs.so will only be done lazily (not loaded when file system is initialized but loaded at the first usage). This avoids the case of the following error in case user does not even want to use hdfs:
It is a little challenging to setup Hadoop for tests so this PR defers. Will create a follow up PR to set up the tests.
Signed-off-by: Yong Tang yong.tang.github@outlook.com