Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16556.Fixtypos in distcp #4217

Merged
merged 15 commits into from
Apr 22, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Overview

[The erstwhile implementation of DistCp]
(http://hadoop.apache.org/docs/r1.2.1/distcp.html) has its share of quirks
and drawbacks, both in its usage, as well as its extensibility and
and drawbacks, both in its usage and its extensibility and
performance. The purpose of the DistCp refactor was to fix these
shortcomings, enabling it to be used and extended programmatically. New
paradigms have been introduced to improve runtime and setup performance,
Expand Down Expand Up @@ -179,7 +179,7 @@ $H3 Update and Overwrite
hdfs://nn2:8020/target/10 32
hdfs://nn2:8020/target/20 64

Will effect:
The result will be:

hdfs://nn2:8020/target/1 32
hdfs://nn2:8020/target/2 32
Expand All @@ -190,7 +190,7 @@ $H3 Update and Overwrite
because it doesn't exist at the target. `10` and `20` are overwritten since
the contents don't match the source.

If `-update` is used, `1` is skipped because the file-length and contents match. `2` is copied because it doesnt exist at the target. `10` and `20` are overwritten since the contents don’t match the source. However, if `-append` is additionally used, then only `10` is overwritten (source length less than destination) and `20` is appended with the change in file (if the files match up to the destination's original length).
If `-update` is used, `1` is skipped because the file-length and contents match. `2` is copied because it doesn't exist at the target. `10` and `20` are overwritten since the contents don’t match the source. However, if `-append` is additionally used, then only `10` is overwritten (source length less than destination) and `20` is appended with the change in file (if the files match up to the destination's original length).

If `-overwrite` is used, `1` is overwritten as well.

Expand Down Expand Up @@ -269,7 +269,7 @@ $H4 Experiment 1: Syncing diff of two adjacent snapshots

$H4 Experiment 2: syncing diff of two non-adjacent snapshots

First do a clean up from Experiment 1.
First do a cleanup from Experiment 1.

hdfs dfs -rm -skipTrash /dst/1.txt

Expand Down Expand Up @@ -514,7 +514,7 @@ $H3 InputFormats and MapReduce Components
* A file with the same name exists at target, but `-overwrite` is
specified.
* A file with the same name exists at target, but differs in block-size
(and block-size needs to be preserved.
and block-size needs to be preserved.

* **CopyCommitter:** This class is responsible for the commit-phase of the
DistCp job, including:
Expand Down Expand Up @@ -576,7 +576,7 @@ $H3 MapReduce and other side-effects
map on a re-execution will be marked as "skipped".
* If a map fails `mapreduce.map.maxattempts` times, the remaining map tasks
will be killed (unless `-i` is set).
* If `mapreduce.map.speculative` is set set final and true, the result of the
* If `mapreduce.map.speculative` is set to be true, the result of the
copy is undefined.

$H3 DistCp and Object Stores
Expand Down Expand Up @@ -691,7 +691,7 @@ Frequently Asked Questions
directory is copied over, rather than the source-directory itself. This
behaviour is consistent with the legacy DistCp implementation as well.

2. **How does the new DistCp differ in semantics from the Legacy DistCp?**
2. **How does the new DistCp differs in semantics from the Legacy DistCp?**

* Files that are skipped during copy used to also have their
file-attributes (permissions, owner/group info, etc.) unchanged, when
Expand Down