-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-16202 Enhance S3A openFile() #2046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16202 Enhance S3A openFile() #2046
Conversation
tested s3 london, unguarded |
I plan to make the length option a generic fs one so that other filesystems (abfs) can also skip a probe here. a standard option makes it easier to justify hive/orc/parquet switching to this api |
@@ -68,6 +68,11 @@ This is relevant with those stores which return version/etag information, | |||
including the S3A and ABFS connectors -they MAY use this to guarantee that | |||
the file they opened is exactly the one returned in the listing. | |||
|
|||
There is no requirement for `status.getPath()` to the resolved path of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this cause uncertain failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this is that viewfs and the hive fs wrappers have their own schemas/paths. If we insist on the paths matching, you woudn't be able to use the API through hive or mounted viewfs, which would be too brittle. Saying "it's just a hint" avoids this issue. Same reason I'm not worrying about type of the status.
s3a FS: use etag and version ID, if known
other status: use the length
Overall this looks really good. |
6233b54
to
b16769f
Compare
1. openFile()(path).opt(fs.s3a.input.length, len) to let caller specify file length. All we actually need 2. if a filestatus of a different type is passed in, we create an FS instance from it. 3. use the path of the command, not that of the filestatus. this helps viewfs support the API Change-Id: Ib9f0f98f701176124436aeaaa0360a426c257bc6
Change-Id: I9f0c11e2fa5cc0dbb84115b6824caf160e1abb62
This has the tests for the changes in the specification (path of supplied status is ignored), and for S3A enhancements, where the cost of the operations against the the raw store is used to determine whether or not any probes for a file were performed when opening it. The test also adds tests for where the file length is set in the option fs.s3a.open.option.length. -in a shorter file, read() will return -1 when the declared length is reached, not the actual. in a longer file, read() will return -1 once the real file length is exceeded. readFully() will raise an EOFException. +fix bug in CTU.readStream() which didn't close the stream after reading. Now, troublespot. I had to add new opt/must operations for long values. this is bad as -we already have an int one which we now overload. I'd delete that one except nothing anyone has already compiled will link. -you can't add a long value and have your code compile against an older version of the code. Not sure what the best action is there. Change-Id: I77ebaa07515f5e2e18fd610ce1f04ef286126f12
b16769f
to
0e6c3fa
Compare
* the file length/seek policy are now defined in hadoop common and the specification, including evolution policy for seek. * S3A FS uses the new options, retaining support for the older s3a seek policy * moved as much as possible into S3AOpenFileHelper * which has unit tests to verify its extraction of properties and file status. Change-Id: I6981394c48583c33b706f9b5eb6c58c5ca078715
💔 -1 overall
This message was automatically generated. |
things aren't complaining as the patch chain is confused; a file which was added and then renamed is still present. I'm going to have to close this PR and create a new one |
I had intended to support etag and version but its too tricky...you'd also need to supply the length or you issue HEAD with the given etag/version to extract that param too. The tracking of all option keys was for that. Not yet used -but left for others
needs new tests
-pass in status with different path (in contract)
-new option skips HEAD request (trivial test -open missing file, seek to below declared length, only expect failure on read())