Skip to content

HADOOP-13655 document object store use with fs shell and distcp #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

steveloughran
Copy link
Contributor

Patch of filesystem shell & distcp docs to cover object stores. Also updated some references in filesystem/index.md which were out of date

(assuming the permissions can be propagated across filesystems)
* `-f` : Overwrites the destination if it already exists.
* `-ignorecrc` : Skip CRC checks on the file(s) downloaded.
* `crc`: write CRC checksums for the files downloaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crc should be -crc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

hadoop fs -put -d /datasets/example.orc s3a://bucket/datasets/

# Upload a file from the local filesystem
hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadoop fs -copyFromLocal -d -f /datasets/devices.orc s3a://bucket/datasets/
The symbol "
" is redundant, right?

hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/

# create a file from stdin
echo "hello" | hadoop fs -put -d -f - wasb://yourcontainer@youraccount.blob.core.windows.net/hello.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadoop fs -put -d -f - wasb: should be hadoop fs -put -d -f wasb:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually no! I couldn't get the put command to read off stdin according to the (Existing) docs, so I looked in the source code and saw that you needed a special "-" as the source, otherwise the s3a url :// was being treated as a source and the command failing as the destination was missing. It does make sense —and is consistent with other tools, just not documented completely.

I'd mentioned this in the "put" section, but I've expanded the comment here and added the - to the DNF form of the put command.

@steveloughran steveloughran changed the title HADOOP-13655 HADOOP-13655 document object store use with fs shell and distcp Nov 1, 2016
Copy link
Member

@liuml07 liuml07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

This is excellent doc; we should get this in ASAP.

This is for branch-2? How about trunk?

Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation
on a large directory tree (the limit is 40 threads).

When `DistCp -update` is used with objec stores,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objec -> object

@@ -729,3 +757,280 @@ usage
Usage: `hadoop fs -usage command`

Return the help for an individual command.


<a name="ObjectStores" />Working with Object Storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<a name="ObjectStores" /> is accidently here I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's to have a link which can be referred to later


* The `-append` option is not supported.

* The `-diff` option is not supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -diff/-rdiff option is not supported

Yes there is an rdiff options that is just added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

* Copy operations within a single object store still take place in the Hadoop cluster
—even when the object store implements a more efficient COPY operation internally

That is, an operation such as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indention is unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is needed to indent the paragraph and code sample under the bullet point.

@steveloughran steveloughran force-pushed the s3/HADOOP-13655-shell-docs branch from 0fea00c to 881366f Compare November 21, 2016 14:03
@asfgit asfgit closed this in beb70fe Nov 22, 2016
@steveloughran steveloughran deleted the s3/HADOOP-13655-shell-docs branch November 23, 2016 14:33
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
Join TTL was set too low (10 ms). `joinRetainsMatchedMessagesReverse` will fail if the execution time between line 176 and 186 is longer than that. Increased TTL to 1 min (will not affect test execution time).

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes apache#131 from prateekm/join-test-fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants