-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-37626: [Java][Cpp] support using existing java file system #37690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I think what we'd want is to add a FileSystem parameter to FileSystemDatasetFactory::Make or FileSystemFactoryOptions, instead of a global. |
hmm, just the naming, it is a field in FileSystemFactoryOptions and then passed down not a global variable. |
Ah, sorry. In that case it should be named something less Java-specific and should probably be typed |
|
||
/// when java context have a file system reference to be used | ||
// then use this file system reference in the context | ||
std::shared_ptr<void> file_system_java = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be Java-specific at all.
This should be shared_ptr<FileSystem>
.
@@ -217,7 +217,7 @@ Result<std::shared_ptr<DatasetFactory>> FileSystemDatasetFactory::Make( | |||
// ARROW-12481. | |||
std::string internal_path; | |||
ARROW_ASSIGN_OR_RAISE(std::shared_ptr<fs::FileSystem> filesystem, | |||
fs::FileSystemFromUri(uri, &internal_path)) | |||
fs::FileSystemFromUriAndFs(uri, &internal_path, options.file_system_java)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you would do is, if a filesystem is provided, call FileSystem::PathFromUri to set internal_path
. If it succeeds, then proceed, else raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds correct, will do later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change (allowing a filesystem to be specified when running discovery) seems good. I agree with @lidavidm that this shouldn't be java-specific.
|
Rationale for this change
when using the dataset api to repeatedly read chunks of file, FileSystem instances are created every time. causing performance degrade.
What changes are included in this PR?
to reuse existing filesystem passed from java context.
Are these changes tested?
tested on my local integration. as there seems no end to end dataset testing in the base
Are there any user-facing changes?
NO