-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-16279. S3Guard: Implement time-based (TTL) expiry for entries … #802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will comment on the bigger picture stuff on the JIRA.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
1d30291
to
b61026c
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
c76296a
to
3e9ca9d
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/ITtlTimeProvider.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardOutOfBandOperations.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardOutOfBandOperations.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardTtl.java
Show resolved
Hide resolved
...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStoreScale.java
Show resolved
Hide resolved
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
Outdated
Show resolved
Hide resolved
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
Outdated
Show resolved
Hide resolved
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
Outdated
Show resolved
Hide resolved
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
Outdated
Show resolved
Hide resolved
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
Show resolved
Hide resolved
Production code looks OK, with one exception. The exception: switch to Configuration.getTimeDuration() to read the time in, and make the default value seconds. Users can switch to any other unit they want by changing the suffix. Doing this now avoids the problems encountered when retrofitting other bits of the code (block size) to using more usable configuration values. test-wise we need to add one which verifies a delete of an expired metadata entry deletes any new raw file. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
|
||
try { | ||
guardedFs.setTtlTimeProvider(mockTimeProvider); | ||
when(mockTimeProvider.getMetadataTtl()).thenReturn(ttl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be aware that using the doAnswer()
mockito API may be less "elegant" but it backports to the older mockito (and hence hadoop) versions more easily. I'd recommend going with the older API for now, to make that backport lower-stress
as far as I can see, patch is done; looks like last bug was squashed. I'm going to D/L and retest locally, do a final review and then give a vote on it. If I can merge in then it gives me the "opportunity" to reconcile my rename patch with it :) |
Got a fair few failures on the OOB tests on my run of the merged patch. This was with s3guard + dynamo, but not auth
|
Some of the getFileStatus-side errors go away when in
to
But other things do fail; I think what we'll need is something which reinstates that get-with-empty-directory option with the TTL. Anyhow, -1 as is, obviously. I'm going to put up a branch with my other changes, which combine some of the minor style changes and adding etag and version ID checks to the assertions about files not matching. Those assertions actually fail on the listings, because they aren't always returning etags. I think all our store list operations should be returning etags, if they aren't already |
Gabor: pushed up a branch which adds one more commit to this PR https://github.com/steveloughran/hadoop/tree/s3/HADOOP-16279-oob-delete
The etag/version ID tests should actually be better tests than anything else for consistency, rather than modtime, so I've placed them first. But they are failing on etags as when the etag come out as null. It shouldn't,obviously BTW, my tests were with your PR rebased onto trunk, if that makes a difference |
@steveloughran, when I run the tests with my branch and without the rebase to trunk my only test failure is in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitMRJob. ITestS3GuardOutOfBandOperations fails with your rebased branch. I'll run some tests to check if I'm able to reproduce the issue. |
tested and for the 2nd time I got the same errors after the rebase to trunk. working on fixing it. |
reran the tests on the original (non-rebased) branch, got failures here and in some others (e.g)
makes me thing the DDB table is in an odd state. Plan:
|
Update, believe I've identified the problem, it's a subtle but fun one with (my suggested) change to accepting configurable times (e.g. 15m) for a TTL, the time in seconds was being passed down to a ttl class expecting millis. As a result, dir listings in the tests were going nonauth ~immediately, so the guardedFS in tests was immediately resyncing with the store. Hence: none of the expected mismatches Fixing that (one liner) with some extended OOB test looking at the internals (written during debugging) and with HADOOP-16368 S3A list operation doesn't pick up etags from results, as I've extended the match testing to include etags rather than just timestamps. much less brittle. I'm retaining the modtime test too, for regression rather than as the sole comparison |
#952 is my patch extending this |
Thanks for finding out what was the issue and fixing it! |
vjagadish1989 nickpan47 Please take a look. Author: Prateek Maheshwari <pmaheshwari@apache.org> Reviewers: Jagadish<jagadish@apache.org> Closes apache#802 from prateekm/api-docs
…(and tombstones)