-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-13230. Optionally retain directory markers #1861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-13230. Optionally retain directory markers #1861
Conversation
6cd495d
to
0cf0df5
Compare
HADOOP-13230. directory markers the LIST call asks for two objects when needEmptyDirectoryFlag = true, so can distinguish dir marker exists moved much of the prefix/object analysis into S3ListResult where I intend to add some unit tests for the result parsing. Change the enum for all innerGetFileStatus calls from ALL to FILES_AND_DIRECTORIES, as we no longer need to do any HEAD / request on a marker; the list finds it after all. There may be more risk of delayed consistency in listings Tests: one mocking test fails (as usual); also failures in ITestS3AFileOperationCost, ITestS3GuardOutOfBandOperations, ITestRestrictedReadAccess
the access ones are failing because LIST is working whereas a HEAD would fail if the caller doesn't have read access to the object. OOB ops may be from me setting up a new bucket. Cost ones: well, our costs have come down. literally |
3a19625
to
5b14077
Compare
|
5b14077
to
dd2570d
Compare
💔 -1 overall
This message was automatically generated. |
Operation cost tests failiing; I'm trying to (a) cut down things to a minimum and (b) understand why every head/list call takes place, rather than change the assertions to match what's happening. Hadn't realised until now, for example, that rename(file, dest) has 3 HEAD calls, for example
|
💔 -1 overall
This message was automatically generated. |
d003c0d
to
8b918ee
Compare
💔 -1 overall
This message was automatically generated. |
8b918ee
to
58a2279
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
failure in restricted permission test as ls empty dir now succeeds in list (no HEAD, see)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a long patch. Let me learn by reading doc and asking questions. Ignore me if it does not make sense. Thanks,
// if needed | ||
directoryPolicy = new DirectoryPolicyImpl(conf, | ||
this::allowAuthoritative); | ||
LOG.debug("Directory marker retention policy is {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be info
level since this is new and client may want to see the message.
public static final Set<StatusProbeEnum> DIRECTORIES = | ||
LIST_ONLY; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need DirMarker
in the StatusProbeEnum
?
// no dir, fall back to looking for a file | ||
// (failure condition if true) | ||
status = innerGetFileStatus(parent, false, | ||
StatusProbeEnum.HEAD_ONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know they are the same, but is StatusProbeEnum.FILE
a bit better here since the status is for heading a file.
// only look against S3 for directories; saves | ||
// a HEAD request on all normal operations. | ||
S3AFileStatus dstParentStatus = innerGetFileStatus(parent, | ||
false, StatusProbeEnum.DIRECTORIES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to fallback to StatusProbeEnum.FILE
because we know this innerGetFileStatus
should not throw FileNotFoundException
- at least dst should show in the listing. Is this correct?
case Authoritative: | ||
return authoritativenes.test(path); | ||
case Delete: | ||
default: // which cannot happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw unchecked exception so new policy (in future) will fail in an obvious way?
73a4ca5
to
1e530b5
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
262b446
to
15de2eb
Compare
15de2eb
to
de11cbd
Compare
de11cbd
to
90defbd
Compare
💔 -1 overall
This message was automatically generated. |
This is a merge of the ongoing work applied as a single patch to trunk. Change-Id: Ia6481fa095b2d4d3135197b81d7b190c20efd4c0
lots of work on cost of operation tests. Change-Id: I846794d8639d860ab36be4edcda0404cfb5d610c todo: move from hard coded 0, 1 into constants everywhere
Change-Id: If589e3df85d6cdddbbc7d1c03f6c578eaff06662
* defined constants for the Head/List cost of various ops, use them in the asserts over "1", "2" etc to make clearer origin of operations * Cut the standalone marker HEAD request. We don't need it, ever. That will simplify backporting too; I have a clear plan of the minimum needed for backporting to branch-3.3 and then again for 3.0-3.2 Change-Id: I33f9ccf2b477c027afe70137993c8344554f07c6
Need to only copy leaf markers on rename so a copy from an auth dir to a non-auth dir doesn't contaminate the dest. Change-Id: Ic99580c15287d409acc13dc807ee16e77b72e4b7
stopping double queuing files to be deleted -DDB was rejecting this fix all tests which were failing due to changes in getFileStatus probes * no HEAD + / probes, so list dir calls don't fail if the caller lacks read access to the object * isFile doesn't do a list any more, and the bucket existence checks don't take place aws-side. * TTL checking logic has improved, breaking tests which assumed that non-auth guarded ops will be probing s3. Change-Id: I4b6ec64f910cd1550dda67c3eee4e96677f009e6
Change-Id: I076b339249098cb61b6087643290e19828b41d60
- move constants of head/list costs into its own class for use across tests - move the declarative assertions about op costs into its own class, OperationCostValidator This is part of a change to move the new operation cost tests into their own suite, as this one is too big and becoming too much merge conflict. Also it'd be nice to include cost asserts into other test suites. Change-Id: Ifafa22e5b66a1fc27f61e81993c3b642a0210d1c
90defbd
to
1bdb50d
Compare
This allows a bucket to have policies of keep, delete or authoritative,
where the latter means "delete in auth dirs"
Retention policy is implemented in its own little class for testing;
does need a callback to probe for a path being auth or not.
Design Document
Tests? Not yet, no.
Change-Id: Ibcd86048ffaa4c39bba5bd4ea796a24ea8d7f479