-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-13660 Copy file till the source file length during distcp #1404
Conversation
I also need to know what happens when a file is truncated. Before it would probably just finish and be happy one read() returned -1. Now it can differentiate between read-all-data and read-less-data. This is good-and we need to take advantage of it by failing when this happens (at the very least, reporting it).Can you add a test which uses truncate() to force this I have very high expectations on tests, the key one being "if a test fails on jenkins and all you have is the report -is that enough to debug a failure?". The new tests don't meet that requirement yet, which I'll highlight |
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
Outdated
Show resolved
Hide resolved
...hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtils.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtils.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
checkstyle: ignore the "More than 7 parameters" warning; for the others address them unless the line length is about 81-82 chars and cutting it down would make readability worse. We're not purists about "must fit on a punched card" but really want lines of a width where side-by-side reviews are straightforward. Thanks. |
+one of the warnings on RetriableFileCopyCommand.java:300 is about an existing issue on the line you've edited. You aren't to blame for that -but now is the time to fix it. sorry |
I checked the behaviour of distcp on a cluster on what happens when truncate call is made while copy is running.
|
yes. It's a change in behaviour, but probably a good one. A test will verify that things fail the way we expect -and will continue to do so. One little concern though: where does that filestatus come from? It's created in the mapper just before the operation, right? As if it's created when the list of files to copy is created (which I doubt...) then the current code handles the case where the file is changed between job schedule and task execution -that would be a visible regression, which we would have to worry about. |
f79c878
to
05bcc7c
Compare
I tried writing the truncate test case but truncate calls always fails while copy is happening with error. ( Failed to TRUNCATE_FILE /tmp/source/1/3 for DFSClient_NONMAPREDUCE_-2094558173_166 on 127.0.0.1 because DFSClient_NONMAPREDUCE_-209455817
The filestatus is created in the mapper. So i think we are good here. |
🎊 +1 overall
This message was automatically generated. |
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
Outdated
Show resolved
Hide resolved
} | ||
} catch (IOException | InterruptedException e) { | ||
LOG.error("Exception encountered ", e); | ||
Assert.fail("Test failed: " + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new AssertionError(Test failed: " + e,e)
Stack traces are too import to throw away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete stack trace is already printed using logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @steveloughran is right, we should log in assert instead of logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I will make the change then.
@steveloughran Do you want me to fix the current 3 checkstyle issues? You said we can ignore some for better readability of code while doing parallel reviews.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that worried about things where lines become 82, 84, 86 chars wide, because often chopping things down can make things more verbose. the current ones are examples of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is still open: we need that stack trace. Or the caught exception is saved to some variable outside the runnable; after the run() we throw that exception if non null
...hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtils.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
+1, committed I did some minor tuning before the commit'
Retested all of the distcp tests after those changes; all good |
Thanks @steveloughran for the review. Closing the PR. |
...hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java
Show resolved
Hide resolved
@@ -18,6 +18,7 @@ | |||
|
|||
package org.apache.hadoop.tools.util; | |||
|
|||
import org.apache.hadoop.tools.DistCpConstants; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move into the other distcp imports
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtils.java
Show resolved
Hide resolved
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtils.java
Show resolved
Hide resolved
sorry, my mistake. Thought this was in and was doing soem more reviews. Ignore that. I'm just trying to run through various outstanding PRs before the 2020 ones come in .. |
No description provided.