Make different PinotFS concrete classes have the same behaviors#3671
Make different PinotFS concrete classes have the same behaviors#3671
Conversation
f45a08d to
76f8e20
Compare
Codecov Report
@@ Coverage Diff @@
## master #3671 +/- ##
============================================
+ Coverage 67.1% 67.18% +0.07%
- Complexity 4 44 +40
============================================
Files 1027 1027
Lines 50819 50879 +60
Branches 7091 7105 +14
============================================
+ Hits 34102 34181 +79
+ Misses 14392 14361 -31
- Partials 2325 2337 +12
Continue to review full report at Codecov.
|
pinot-filesystem/src/main/java/com/linkedin/pinot/filesystem/PinotFS.java
Outdated
Show resolved
Hide resolved
pinot-hadoop-filesystem/src/main/java/com/linkedin/pinot/filesystem/HadoopPinotFS.java
Outdated
Show resolved
Hide resolved
pinot-filesystem/src/main/java/com/linkedin/pinot/filesystem/LocalPinotFS.java
Outdated
Show resolved
Hide resolved
pinot-hadoop-filesystem/src/main/java/com/linkedin/pinot/filesystem/HadoopPinotFS.java
Outdated
Show resolved
Hide resolved
pinot-hadoop-filesystem/src/main/java/com/linkedin/pinot/filesystem/HadoopPinotFS.java
Outdated
Show resolved
Hide resolved
pinot-hadoop-filesystem/src/main/java/com/linkedin/pinot/filesystem/HadoopPinotFS.java
Outdated
Show resolved
Hide resolved
2bd4fce to
58d5a77
Compare
58d5a77 to
e1e8e7b
Compare
e1e8e7b to
f7c7d6b
Compare
f7c7d6b to
3c6279b
Compare
mcvsubbu
left a comment
There was a problem hiding this comment.
Thanks for taking this on, @jackjlli .
Can we first document the PinotFS class as we want the behavior to be?
What does copyToLocalFile do if src is a dir, dst exists (a a dir/file), etc.? I think to start with, we should define it so that we only handle the case where src is a file and dst is not there? Let us not introduce a force flag here until we iron out the interface correctly?
Same with copyFromLocalFile
| * If both src and dst are files, dst will be overwritten. | ||
| * If src is a file and dst is a directory, src file will get copied under dst directory. | ||
| * If both src and dst are directories, src folder will get copied under dst directory. | ||
| * If src is a directory and dst is a file, operation will fail. |
There was a problem hiding this comment.
Clarify behavior if src is dir, dst is file, and overwrite is true. Looks like LocalFS succeeds.
| /** | ||
| * Same as move except the srcUri is not retained. For example, if x/y/z is copied to a/b/c, x/y/z will be retained | ||
| * and x/y/z will also be present as a/b/c | ||
| * Same as move except the srcUri is retained. For example, if a file x/y/z is copied to a/b/c, x/y/z will be retained |
There was a problem hiding this comment.
Instead of "Same as move" can you mention each case explicitly and describe the behavior? thanks.
| public boolean isDirectory(URI uri) { | ||
| FileStatus fileStatus = new FileStatus(); | ||
| fileStatus.setPath(new Path(uri)); | ||
| public boolean isDirectory(URI uri) throws IOException { |
There was a problem hiding this comment.
can we just invoke _hadoopFS.isDirectory()
|
|
||
| /** | ||
| * Returns the length of the file at the provided location. Will throw exception if a directory | ||
| * Returns the length of the file at the provided location. Will throw exception if a directory. |
There was a problem hiding this comment.
Looks like this API was added to this class only for testing? Pinot code does not seem access it.
|
|
||
| /** | ||
| * Lists all the files at the location provided. Lists recursively. | ||
| * Lists all the files and directories at the location provided. Lists recursively. |
There was a problem hiding this comment.
Lists recursively if "recursive" is set to true.
Also, if the URI provided is not a directory, are we expected to throw exception no matter what the value of 'recursive' is? i think the localfs will not throw exception if recursive is set to false
| Assert.assertTrue(localPinotFS.exists(thirdTestFile.toURI())); | ||
|
|
||
| // Check that destination being directory within the source directory still works | ||
| File anotherFileUnderAbsoluteThreeDir = new File(newAbsoluteTempDirPath3, "anotherFile"); |
There was a problem hiding this comment.
thanks for adding these test cases
| return false; | ||
| } else { | ||
| delete(dstUri, true); | ||
| mkdir(dstUri); |
There was a problem hiding this comment.
not sure why mkdir here. shouldn't we just be doing a move and let the file system move either a file or a dir?
| FileUtil.copy(_hadoopFS, sourceFiles.next().getPath(), _hadoopFS, target, true, _hadoopConf); | ||
| if (!succeeded) { | ||
| return false; | ||
| if (!exists(srcUri)) { |
There was a problem hiding this comment.
Maybe use a base class and move these there? You can add a TODO
There was a problem hiding this comment.
One thing that occurs to me as I review something like this, is that for some (slow) FS, the latency of making each call may be so bad, that we are better off calling some native file system method and you get what you get. Going this route is relatively easy, and we just need to make sure that Pinot's requirements are met in all file system (i.e. if Pinot uses the case of copying from a dir to a file and that case works fine across everything, then that is alll we need).
Let us discuss a bit more before we put in too much work to unify file systems.
|
I'm closing this PR since it's been split to multiple PRs. |
Currently different PinotFS behaves differently.
E.g., as for
listFile(URI, true)LocalPinotFS lists all the files and directories, while HadoopPinotFS only list non-dir files.This PR:
FShave the same behaviors.