-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-13655 document object store use with fs shell and distcp #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-13655 document object store use with fs shell and distcp #131
Conversation
(assuming the permissions can be propagated across filesystems) | ||
* `-f` : Overwrites the destination if it already exists. | ||
* `-ignorecrc` : Skip CRC checks on the file(s) downloaded. | ||
* `crc`: write CRC checksums for the files downloaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crc
should be -crc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
hadoop fs -put -d /datasets/example.orc s3a://bucket/datasets/ | ||
|
||
# Upload a file from the local filesystem | ||
hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop fs -copyFromLocal -d -f /datasets/devices.orc s3a://bucket/datasets/" is redundant, right?
The symbol "
hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/ | ||
|
||
# create a file from stdin | ||
echo "hello" | hadoop fs -put -d -f - wasb://yourcontainer@youraccount.blob.core.windows.net/hello.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop fs -put -d -f - wasb:
should be hadoop fs -put -d -f wasb:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually no! I couldn't get the put command to read off stdin according to the (Existing) docs, so I looked in the source code and saw that you needed a special "-" as the source, otherwise the s3a url :// was being treated as a source and the command failing as the destination was missing. It does make sense —and is consistent with other tools, just not documented completely.
I'd mentioned this in the "put" section, but I've expanded the comment here and added the - to the DNF form of the put command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This is excellent doc; we should get this in ASAP.
This is for branch-2? How about trunk?
Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation | ||
on a large directory tree (the limit is 40 threads). | ||
|
||
When `DistCp -update` is used with objec stores, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objec -> object
@@ -729,3 +757,280 @@ usage | |||
Usage: `hadoop fs -usage command` | |||
|
|||
Return the help for an individual command. | |||
|
|||
|
|||
<a name="ObjectStores" />Working with Object Storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<a name="ObjectStores" />
is accidently here I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's to have a link which can be referred to later
|
||
* The `-append` option is not supported. | ||
|
||
* The `-diff` option is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -diff/-rdiff
option is not supported
Yes there is an rdiff
options that is just added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
* Copy operations within a single object store still take place in the Hadoop cluster | ||
—even when the object store implements a more efficient COPY operation internally | ||
|
||
That is, an operation such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indention is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is needed to indent the paragraph and code sample under the bullet point.
Change-Id: I0f50d0a7823a86a9e0b1fdfc70a69fcebd1d22ce
0fea00c
to
881366f
Compare
Join TTL was set too low (10 ms). `joinRetainsMatchedMessagesReverse` will fail if the execution time between line 176 and 186 is longer than that. Increased TTL to 1 min (will not affect test execution time). Author: Prateek Maheshwari <pmaheshw@linkedin.com> Reviewers: Navina Ramesh <navina@apache.org> Closes apache#131 from prateekm/join-test-fix
Patch of filesystem shell & distcp docs to cover object stores. Also updated some references in filesystem/index.md which were out of date