Skip to content

Commit 7bb09f1

Browse files
HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep" (#5689)
This 1. changes the default value of fs.s3a.directory.marker.retention to "keep" 2. no longer prints a message when an S3A FS instances is instantiated with any option other than delete. Switching to marker retention improves performance on any S3 bucket as there are no needless marker DELETE requests -leading to a reduction in write IOPS and and any delays waiting for the DELETE call to finish. There are *very* significant improvements on versioned buckets, where tombstone markers slow down LIST operations: the more tombstones there are, the worse query planning gets. Having versioning enabled on production stores is the foundation of any data protection strategy, so this has tangible benefits in production. It is *not* compatible with older hadoop releases; specifically - Hadoop branch 2 < 2.10.2 - Any release of Hadoop 3.0.x and Hadoop 3.1.x - Hadoop 3.2.0 and 3.2.1 - Hadoop 3.3.0 Incompatible releases have no problems reading data in stores where markers are retained, but can get confused when deleting or renaming directories. If you are still using older versions to write to data, and cannot yet upgrade, switch the option back to "delete" Contributed by Steve Loughran
1 parent 0e6bd09 commit 7bb09f1

File tree

7 files changed

+141
-158
lines changed

7 files changed

+141
-158
lines changed

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1088,7 +1088,7 @@ private Constants() {
10881088
* Default retention policy: {@value}.
10891089
*/
10901090
public static final String DEFAULT_DIRECTORY_MARKER_POLICY =
1091-
DIRECTORY_MARKER_POLICY_DELETE;
1091+
DIRECTORY_MARKER_POLICY_KEEP;
10921092

10931093

10941094
/**

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/DirectoryPolicyImpl.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,11 +186,11 @@ public static DirectoryPolicy getDirectoryPolicy(
186186
policy = DELETE;
187187
break;
188188
case DIRECTORY_MARKER_POLICY_KEEP:
189-
LOG.info("Directory markers will be kept");
189+
LOG.debug("Directory markers will be kept");
190190
policy = KEEP;
191191
break;
192192
case DIRECTORY_MARKER_POLICY_AUTHORITATIVE:
193-
LOG.info("Directory markers will be kept on authoritative"
193+
LOG.debug("Directory markers will be kept on authoritative"
194194
+ " paths");
195195
policy = new DirectoryPolicyImpl(MarkerPolicy.Authoritative,
196196
authoritativeness);

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -792,7 +792,7 @@ Security
792792
Delegation token support is disabled
793793
794794
Directory Markers
795-
The directory marker policy is "delete"
795+
The directory marker policy is "keep"
796796
Available Policies: delete, keep, authoritative
797797
Authoritative paths: fs.s3a.authoritative.path=```
798798
```

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md

Lines changed: 125 additions & 109 deletions
Large diffs are not rendered by default.

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,10 @@
2323

2424
### <a name="directory-marker-compatibility"></a> Directory Marker Compatibility
2525

26-
1. This release can safely list/index/read S3 buckets where "empty directory"
27-
markers are retained.
28-
29-
1. This release can be configured to retain these directory makers at the
30-
expense of being backwards incompatible.
26+
This release does not delete directory markers when creating
27+
files or directories underneath.
28+
This is incompatible with versions of the Hadoop S3A client released
29+
before 2021.
3130

3231
Consult [Controlling the S3A Directory Marker Behavior](directory_markers.html) for
3332
full details.

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -119,14 +119,7 @@ Without S3Guard, listing performance may be slower. However, Hadoop 3.3.0+ has s
119119
improved listing performance ([HADOOP-17400](https://issues.apache.org/jira/browse/HADOOP-17400)
120120
_Optimize S3A for maximum performance in directory listings_) so this should not be apparent.
121121

122-
We recommend disabling [directory marker deletion](directory_markers.html) to reduce
123-
the number of DELETE operations made when writing files.
124-
this reduces the load on the S3 partition and so the risk of throttling, which can
125-
impact performance.
126-
This is very important when working with versioned S3 buckets, as the tombstone markers
127-
created will slow down subsequent listing operations.
128-
129-
Finally, the S3A [auditing](auditing.html) feature adds information to the S3 server logs
122+
The S3A [auditing](auditing.html) feature adds information to the S3 server logs
130123
about which jobs, users and filesystem operations have been making S3 requests.
131124
This auditing information can be used to identify opportunities to reduce load.
132125

@@ -162,7 +155,6 @@ Example
162155
```bash
163156
> hadoop s3guard bucket-info -magic -markers keep s3a://test-london/
164157

165-
2021-11-22 15:21:00,289 [main] INFO impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(189)) - Directory markers will be kept
166158
Filesystem s3a://test-london
167159
Location: eu-west-2
168160

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md

Lines changed: 7 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -339,16 +339,19 @@ Hadoop supports [different policies for directory marker retention](directory_ma
339339
-essentially the classic "delete" and the higher-performance "keep" options; "authoritative"
340340
is just "keep" restricted to a part of the bucket.
341341

342-
Example: test with `markers=delete`
342+
343+
Example: test with `markers=keep`
343344

344345
```
345-
mvn verify -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete
346+
mvn verify -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep
346347
```
347348

348-
Example: test with `markers=keep`
349+
This is the default and does not need to be explicitly set.
350+
351+
Example: test with `markers=delete`
349352

350353
```
351-
mvn verify -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep
354+
mvn verify -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete
352355
```
353356

354357
Example: test with `markers=authoritative`
@@ -1268,33 +1271,6 @@ bin/hdfs fetchdt -print secrets.bin
12681271
# expect warning "No TokenRenewer defined for token kind S3ADelegationToken/Session"
12691272
bin/hdfs fetchdt -renew secrets.bin
12701273

1271-
# ---------------------------------------------------
1272-
# Directory markers
1273-
# ---------------------------------------------------
1274-
1275-
# require success
1276-
bin/hadoop s3guard bucket-info -markers aware $BUCKET
1277-
# expect failure unless bucket policy is keep
1278-
bin/hadoop s3guard bucket-info -markers keep $BUCKET/path
1279-
1280-
# you may need to set this on a per-bucket basis if you have already been
1281-
# playing with options
1282-
bin/hadoop s3guard -D fs.s3a.directory.marker.retention=keep bucket-info -markers keep $BUCKET/path
1283-
bin/hadoop s3guard -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep bucket-info -markers keep $BUCKET/path
1284-
1285-
# expect to see "Directory markers will be kept" messages and status code of "46"
1286-
bin/hadoop fs -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep -mkdir $BUCKET/p1
1287-
bin/hadoop fs -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep -mkdir $BUCKET/p1/p2
1288-
bin/hadoop fs -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep -touchz $BUCKET/p1/p2/file
1289-
1290-
# expect failure as markers will be found for /p1/ and /p1/p2/
1291-
bin/hadoop s3guard markers -audit -verbose $BUCKET
1292-
1293-
# clean will remove markers
1294-
bin/hadoop s3guard markers -clean -verbose $BUCKET
1295-
1296-
# expect success and exit code of 0
1297-
bin/hadoop s3guard markers -audit -verbose $BUCKET
12981274

12991275
# ---------------------------------------------------
13001276
# Copy to from local

0 commit comments

Comments
 (0)