Skip to content

Make different PinotFS concrete classes have the same behaviors#3671

Closed
jackjlli wants to merge 3 commits intomasterfrom
pinotfs-copy-ensure-parent-dir-exist
Closed

Make different PinotFS concrete classes have the same behaviors#3671
jackjlli wants to merge 3 commits intomasterfrom
pinotfs-copy-ensure-parent-dir-exist

Conversation

@jackjlli
Copy link
Member

Currently different PinotFS behaves differently.
E.g., as for listFile(URI, true) LocalPinotFS lists all the files and directories, while HadoopPinotFS only list non-dir files.
This PR:

@jackjlli jackjlli requested a review from jenniferdai January 10, 2019 00:06
@jackjlli jackjlli force-pushed the pinotfs-copy-ensure-parent-dir-exist branch 4 times, most recently from f45a08d to 76f8e20 Compare January 10, 2019 01:18
@codecov-io
Copy link

codecov-io commented Jan 10, 2019

Codecov Report

Merging #3671 into master will increase coverage by 0.07%.
The diff coverage is 67.05%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3671      +/-   ##
============================================
+ Coverage      67.1%   67.18%   +0.07%     
- Complexity        4       44      +40     
============================================
  Files          1027     1027              
  Lines         50819    50879      +60     
  Branches       7091     7105      +14     
============================================
+ Hits          34102    34181      +79     
+ Misses        14392    14361      -31     
- Partials       2325     2337      +12
Impacted Files Coverage Δ Complexity Δ
...main/java/org/apache/pinot/filesystem/PinotFS.java 100% <ø> (ø) 0 <0> (ø) ⬇️
.../controller/helix/core/SegmentDeletionManager.java 79.67% <100%> (ø) 0 <0> (ø) ⬇️
...java/org/apache/pinot/filesystem/LocalPinotFS.java 73.41% <18.75%> (-14.28%) 0 <0> (ø)
...ava/org/apache/pinot/filesystem/HadoopPinotFS.java 64.47% <77.27%> (+64.47%) 40 <28> (+40) ⬆️
...a/manager/realtime/RealtimeSegmentDataManager.java 75% <0%> (-25%) 0% <0%> (ø)
...egation/function/customobject/MinMaxRangePair.java 75.86% <0%> (-24.14%) 0% <0%> (ø)
...lix/EmptyBrokerOnlineOfflineStateModelFactory.java 86.66% <0%> (-13.34%) 0% <0%> (ø)
.../apache/pinot/core/transport/DataTableHandler.java 88% <0%> (-12%) 0% <0%> (ø)
...elix/core/relocation/RealtimeSegmentRelocator.java 70.4% <0%> (-10.21%) 0% <0%> (ø)
.../apache/pinot/common/config/RealtimeTagConfig.java 93.33% <0%> (-6.67%) 0% <0%> (ø)
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd2dc04...3c6279b. Read the comment docs.

@jackjlli jackjlli force-pushed the pinotfs-copy-ensure-parent-dir-exist branch 3 times, most recently from 2bd4fce to 58d5a77 Compare January 15, 2019 20:56
@jackjlli jackjlli force-pushed the pinotfs-copy-ensure-parent-dir-exist branch from 58d5a77 to e1e8e7b Compare January 18, 2019 18:03
@jackjlli jackjlli force-pushed the pinotfs-copy-ensure-parent-dir-exist branch from e1e8e7b to f7c7d6b Compare February 1, 2019 23:44
@jackjlli jackjlli force-pushed the pinotfs-copy-ensure-parent-dir-exist branch from f7c7d6b to 3c6279b Compare February 8, 2019 18:51
Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, @jackjlli .
Can we first document the PinotFS class as we want the behavior to be?
What does copyToLocalFile do if src is a dir, dst exists (a a dir/file), etc.? I think to start with, we should define it so that we only handle the case where src is a file and dst is not there? Let us not introduce a force flag here until we iron out the interface correctly?
Same with copyFromLocalFile

* If both src and dst are files, dst will be overwritten.
* If src is a file and dst is a directory, src file will get copied under dst directory.
* If both src and dst are directories, src folder will get copied under dst directory.
* If src is a directory and dst is a file, operation will fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify behavior if src is dir, dst is file, and overwrite is true. Looks like LocalFS succeeds.

/**
* Same as move except the srcUri is not retained. For example, if x/y/z is copied to a/b/c, x/y/z will be retained
* and x/y/z will also be present as a/b/c
* Same as move except the srcUri is retained. For example, if a file x/y/z is copied to a/b/c, x/y/z will be retained
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "Same as move" can you mention each case explicitly and describe the behavior? thanks.

public boolean isDirectory(URI uri) {
FileStatus fileStatus = new FileStatus();
fileStatus.setPath(new Path(uri));
public boolean isDirectory(URI uri) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just invoke _hadoopFS.isDirectory()


/**
* Returns the length of the file at the provided location. Will throw exception if a directory
* Returns the length of the file at the provided location. Will throw exception if a directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this API was added to this class only for testing? Pinot code does not seem access it.


/**
* Lists all the files at the location provided. Lists recursively.
* Lists all the files and directories at the location provided. Lists recursively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists recursively if "recursive" is set to true.

Also, if the URI provided is not a directory, are we expected to throw exception no matter what the value of 'recursive' is? i think the localfs will not throw exception if recursive is set to false

Assert.assertTrue(localPinotFS.exists(thirdTestFile.toURI()));

// Check that destination being directory within the source directory still works
File anotherFileUnderAbsoluteThreeDir = new File(newAbsoluteTempDirPath3, "anotherFile");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these test cases

return false;
} else {
delete(dstUri, true);
mkdir(dstUri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why mkdir here. shouldn't we just be doing a move and let the file system move either a file or a dir?

FileUtil.copy(_hadoopFS, sourceFiles.next().getPath(), _hadoopFS, target, true, _hadoopConf);
if (!succeeded) {
return false;
if (!exists(srcUri)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a base class and move these there? You can add a TODO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that occurs to me as I review something like this, is that for some (slow) FS, the latency of making each call may be so bad, that we are better off calling some native file system method and you get what you get. Going this route is relatively easy, and we just need to make sure that Pinot's requirements are met in all file system (i.e. if Pinot uses the case of copying from a dir to a file and that case works fine across everything, then that is alll we need).

Let us discuss a bit more before we put in too much work to unify file systems.

@jackjlli
Copy link
Member Author

jackjlli commented Apr 9, 2019

I'm closing this PR since it's been split to multiple PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants