-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-11452 make rename/3 public #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-11452 make rename/3 public #743
Conversation
* @throws FileAlreadyExistsException dest path exists and is a file | ||
* @throws ParentNotDirectoryException if the parent path of dest is not | ||
* a directory | ||
* @throws IOException on failure | ||
*/ | ||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You remove the deprecation here from the abstract FS. How will it affect the target release it can get in?
FS is
@InterfaceAudience.Public
@InterfaceStability.Stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it changed to protected
to public
. The same question applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's exactly the purpose: take a method which is protected and used only by FileContext, and make it public. It's been in for a while, been stable -and now needs tests and documentation of what it really does
deletion, preventing the stores' use as drop-in replacements for HDFS. | ||
|
||
### `boolean rename(Path src, Path d)` | ||
|
||
In terms of its specification, `rename()` is one of the most complex operations within a filesystem . | ||
|
||
In terms of its implementation, it is the one with the most ambiguity regarding when to return false | ||
versus raising an exception. | ||
versus raising an exception. This makes less useful to use than | ||
others: in production code it is usually wrapped by code to raise an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
The semantics of this method are simpler and more clearly defined. | ||
|
||
- The final path of the rename is always that of the `dest` argument; there | ||
is no attempt to generate a new path underneath a target directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
another, SHOULD be atomic. Many applications rely on this as a way to commit operations. | ||
|
||
* However, the base implementation is *not* atomic; some of the precondition checks | ||
are performed separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
Pushed up new version which does a lot more w.r.t testing, including updating I'm not comfortable with that design; I think it'd be better to pull the rename preconditions into some standalone class which takes a src & dest FS status and path strings, and do the checks there where they can validated in isolation. S3A would just precalculate those and then reuse them later on. Be easier for testing too. but...that doesn't help abfs so much, does it?
|
|
deletion, preventing the stores' use as drop-in replacements for HDFS. | ||
|
||
### `boolean rename(Path src, Path d)` | ||
|
||
In terms of its specification, `rename()` is one of the most complex operations within a filesystem . | ||
|
||
In terms of its implementation, it is the one with the most ambiguity regarding when to return false | ||
versus raising an exception. | ||
versus raising an exception. This makes less useful to use than | ||
others: in production code it is usually wrapped by code to raise an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
The semantics of this method are simpler and more clearly defined. | ||
|
||
- The final path of the rename is always that of the `dest` argument; there | ||
is no attempt to generate a new path underneath a target directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
another, SHOULD be atomic. Many applications rely on this as a way to commit operations. | ||
|
||
* However, the base implementation is *not* atomic; some of the precondition checks | ||
are performed separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
deletion, preventing the stores' use as drop-in replacements for HDFS. | ||
|
||
### `boolean rename(Path src, Path d)` | ||
|
||
In terms of its specification, `rename()` is one of the most complex operations within a filesystem . | ||
|
||
In terms of its implementation, it is the one with the most ambiguity regarding when to return false | ||
versus raising an exception. | ||
versus raising an exception. This makes less useful to use than | ||
others: in production code it is usually wrapped by code to raise an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
The semantics of this method are simpler and more clearly defined. | ||
|
||
- The final path of the rename is always that of the `dest` argument; there | ||
is no attempt to generate a new path underneath a target directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
another, SHOULD be atomic. Many applications rely on this as a way to commit operations. | ||
|
||
* However, the base implementation is *not* atomic; some of the precondition checks | ||
are performed separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
Assert.fail("Should throw IOException"); | ||
} catch (IOException ioe) { | ||
// expected | ||
intercept(IOException.class, () -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does permission check for getFileStatus
call.
But according to HdfsPermissionsGuide, getFileStatus/getFileInfo
call should not check the permission. I'm wondering do I miss any documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to revert all changes to this test suite other than those which go near rename, so whatever is tested for today is going to be exactly what is continued to be tested for. I Don't know what the nuances of HDFS permissions are.
I'd assume that the getFileStatus call only needs metadata/list permissions, rather than read access. That HDFS doc states that client needs execute permission on all parent directories, so that it can get from / to the file. Therefore the "has permissions to call getFileXXX" is implicit if you can get to the dir, right?
Assert.fail("Should throw IOException"); | ||
} catch (IOException ioe) { | ||
// expected | ||
intercept(IOException.class, () -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I think there should not be permission check for listStatus according to HdfsPermissionsGuide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that's interesting. I wonder what unix does there. Because HDFS should really be doing the same. If I'm in dir /a and don't have exec perms for /a/b, I should be able call stat /a/b, just not ls(/a/b). (pauses to check. OK, on the command line, removing x perms from a dir means ls() returns an empty list, not an error
(and hadoop fs -ls -R command returns an empty list too)
Path dst, | ||
final Options.Rename... options) | ||
throws IOException { | ||
fs.rename(src, dst, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deprecated version checks the return code here and bails if the rename wasn't successful. If Options includes Options.Rename.NONE and dst already exists, this will return false? If so, we shouldn't proceed to overwrite the checksums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if options = NONE and the dest exists, the call MUST raise an exception. I've just reviewed this carefully and added comments to clarify
Path srcCheckFile = getChecksumFile(src); | ||
Path dstCheckFile = getChecksumFile(dst); | ||
if (fs.exists(srcCheckFile)) { //try to rename checksum | ||
fs.rename(srcCheckFile, dstCheckFile, Options.Rename.OVERWRITE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be options
with Options.Rename.OVERWRITE
appended? I think all the Rename options work alone, but if we are passing an unbounded list of options, not passing them through smells like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting thought. The only current option is trash; don't know if we should retain that. I will add a comment in the javadocs of Options.Rename on this topic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+added comments in Option as well as Checksum FS
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSExceptionMessages.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/RenameHelper.java
Outdated
Show resolved
Hide resolved
for commenters, yes, this needs a rebase against trunk and then a review of all your comments. I cherish your patience, especially as all the comments are good ones |
…TaskApplication in JobPlanner Author: Prateek Maheshwari <pmaheshwari@apache.org> Reviewers: Ray Mathuru <rmathuru@linkedin.com>, Jagadish Venkatraman <vjagadish1989@linkedin.com>, Sanil Jain <snjain@linkedin.com> Closes apache#743 from prateekm/app-mode-fix
PR with the previous patches. This is a WiP but ready for some iniital review; lacks tests & spec. Also, because the base FileSystem.rename/3 does its own src/dest checks, it's less efficient against object stores. They need their own high-perf subclass. Change-Id: I1586ef2290d7a3d2d33b1a32e2f0999b07c26143
+ updates docs + contains fix for HADOOP-16255 : checksum FS doesn't do rename/3 properly tests are based on those of rename; there's still some stuff about renaming into an empty directory which makes me suspect there's ambiguity there Change-Id: Ic1ca6cbe3e9ca80ab8e1459167a7012678e856fc
* exceptions match expectations of FSMainOperationsBaseTest * clean up FSMainOperationsBaseTest by moving to intercept, try with resources. * S3A to implement subclass of FSMainOperationsBaseTest, ITestS3AFSMainOperations * S3A to implement ITestS3AContractRenameEx * Add protected override point, executeInnerRename, for implementing rename/3; base class calls rename() and throws if that returns false, i.e. current behaviour * S3A overrides executeInnerRename to call its own innerRename() and so raise all failures as IOEs. Issues: is the rename point executeInnerRename() a good name? S3AFS should really implement a direct rename/3 so there's no duplication of checks for parent etc, maybe even passing the values down. Or we make sure innerRename() is consistent with the spec, which primarily means logic about dest dir existing. Change-Id: I0ac3695434d85072ab860854e5e88bc6d36e754a
Change-Id: If2ab67152e08e4c2a225f6e89c24a5d1ff79ee59
Factored the rename check logic out into a RenameHelper which is then used in S3A FileSystem as the PoC of how it can use the RenameHelper then directly invoke the inner operations. Added some more @Retry attributes as appropriate. Conclusion: it works, but for efficient IOPS then `innerRename()` needs to take optional source and dest status values and so so omit repeating the checks. For more work on this * the tests; file context has some so review and add to AbstractContractRenameTest. Also, based on some feedback from Sean Mackrory: verify the renamed files actually have the data. * move internal use of rename (distcp, maprv2 FileOutputcommitter), etc to use this. of the 50+ places which call rename, they seem split 3-ways int 1. subclasses and forwarding 2. invocations whch check the return value and throw an IOE 3. invocations which are written on the assumption that renames raise exceptions on failure 2 & 3 are the ones to change. Change-Id: Id77ed7352b9d5ddb124f9191c5c5f1b8a76da7bb
Review of RenameHelper based on current coding styles and plans (IOStats, etc) Change-Id: I3d39ee3ed04a10e7db2c2b2c79833b945b4d691b
2426328
to
f4422a4
Compare
S3A high performance edition. This avoids all surplus S3 calls and has meaningful exception raising. TODO: * pull the S3A code out into is own operation + extra callbacks (innerGetFileStatus is all that's needed) * see if the FileContext default logic can be pulled out too, using a custom set of callbacks. If it can't the logic is broken. * do some testing Change-Id: I408b2cfe93f266cf0c9084fa8f05bb84b65c2bad
* Add RawLocalFileSystem rename with useful errors * pull out all rename docs into their own filesystem.md doc * Add a callback impl which => FileContext too, at least for the nonatomic rename. FC doesn't do path name checking. Should we? Proposed changes * move the new interfaces up to o.a.h.fs, so that .impl is never imported in FileSystem APIS. * remove the createRename callbacks method, just have stores with implement rename/3 other than the base FS to override all of rename 3. Change-Id: I1fab598553b8e9de4d659b80248bac440dbac018
as per Daryn's comments in : https://issues.apache.org/jira/browse/HADOOP-16281?focusedCommentId=16832628&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16832628 No changes to HDFS error messages. Review this |
closing and creating a new rebased/squashed PR. Still need to look at the comments related to permissions |
PR with the previous patches.
This is a WiP but ready for some iniital review; lacks tests & spec.
Also, because the base FileSystem.rename/3 does its own src/dest checks, it's less efficient against object stores. They need their own high-perf subclass.
Change-Id: I1586ef2290d7a3d2d33b1a32e2f0999b07c26143