-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-16629: support copyFile in s3afilesystem #1655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
Test results with 8 parallel threads. (region=us-west-2) Errors are not related to this patch. |
|
💔 -1 overall
This message was automatically generated. |
|
|
||
| package org.apache.hadoop.fs.contract.s3a; | ||
|
|
||
| import org.apache.hadoop.conf.Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the following import order:
java.*
javax.*
--
every but org.apache
--
org,apache.*
--
static stuff, in order
|
|
||
| package org.apache.hadoop.fs.s3a.impl; | ||
|
|
||
| import org.apache.hadoop.fs.FileAlreadyExistsException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the following import order:
java.*
javax.*
--
every but org.apache
--
org,apache.*
--
static stuff, in order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bgaborg . IDE settings was not right. Fixed the ordering in recent commit.
|
💔 -1 overall
This message was automatically generated. |
|
|
||
| #### Postconditions | ||
|
|
||
| `dstFile` is available in the filesystem. `dstFile` matches that of `srcFile`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, needs a bit more formality.
|
|
||
| if not exists(FS, srcFile) : raise FileNotFoundException | ||
|
|
||
| if not exists(parentFolder(dstFile)) : raise [IllegalArguementException] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer FNFE
| describe("Test folder copy. Recursive copy is not supported yet."); | ||
| // initiate copy | ||
| handleExpectedException(intercept(Exception.class, | ||
| () -> fs.copyFile(srcDir.toUri(), dstDir.toUri()).get())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert dest file is not there. if a file existed before, then it is unchanged
| // initiate copy | ||
| fs.copyFile(zeroByteFile.toUri(), dstFile.toUri()).get(); | ||
| // Check if file is available | ||
| assertIsFile(zeroByteFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert source still there
| // Check if file is available and verify its contents | ||
| assertIsFile(srcFile); | ||
| assertIsFile(dstFile); | ||
| verifyFileContents(fs, dstFile, srcData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for s3 we need to spin a bit here for eventual consistency; use LambdaTestUtils.eventually()
| } | ||
|
|
||
| @Test | ||
| public void testNonExistingFileCopy() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add equivalent where the dest file exists; verify it is still there
|
see #1679 for my proposal for a Filesystem API for multipart uploads. That is a draft implementation right now which lacks:
I must highlight the If you look at how RenameOperation uses copyFile(); it is updating its RenameTracker to keep that operation state consistent; for the CommitOperations it also gets passed around by way of a CommitContext. copy (as the multipart upload) is going to have to do the same. I can help by rounding out the #1679 proposal with the tracking of that state to show you what to do, but
I know HDFS is not above adding some private API to help out HBase and then, well nobody notices, pulling at that up into hadoop-common, but I don't like that see (HDFS-8631)[https://issues.apache.org/jira/browse/HDFS-8631?focusedCommentId=16961004&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16961004]. I do expect a fair amount of rigour here because we are going to be maintaining these APIs for decades. |
| private CopyResult copyFile(String srcKey, String dstKey, long size, | ||
| S3ObjectAttributes srcAttributes, S3AReadOpContext readContext) | ||
| throws IOException, InterruptedIOException { | ||
| return copyFile(bucket, srcKey, bucket, dstKey, size, srcAttributes, readContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't update S3Guard BTW. IF you want newly created files to be visible, your copy operation owns this problem
|
Also: -1 to cross store copies. |
This is subtask of HADOOP-16604 which aims to provide copy functionality for cloud native applications. Intent of this PR is to provide copyFile(URI src, URI dst) functionality for S3AFileSystem (HADOOP-16629).
Creating new PR due to a merge mess up in #1591.
Changes w.r.t PR:1591:
public CompletableFuture<Void> copyFile(URI srcFile, URI dstFile)CompletableFuture makes the API nicer. However,
CompletableFuture::get --> waitingAndGetinvokesRuntime.getAvailableProcessorsfrequently. This can turn outto be expensive native call depending on workload. We can optimise this later, if it turns out to be an issue.
If the destination bucket is different, relevant persmissions/policies have to be already setup, without which it would throw exceptions. Providing URI instead of path, makes it easier to mention different buckets on need basis. Since this is yet to stabilize in implemetation, we can make relevant changes in the store.
Testing was done in region=us-west-2 on my local laptop. Contract tests and huge file tests passed . Other tests are still running and I will post the results. (ITestS3AContractRename failed, but not related to this patch)