Skip to content

Commit 3e4b6df

Browse files
committed
HADOOP-19278 make sure bucket-info docs are in sync with code
Change-Id: Ifb06ef183e245b29e68f92bdcdb608997bd767aa
1 parent f810fb4 commit 3e4b6df

File tree

2 files changed

+14
-33
lines changed

2 files changed

+14
-33
lines changed

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md

Lines changed: 11 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -172,10 +172,13 @@ in `_$folder$` was considered to be a sign that a directory existed. A call to
172172
The S3A also has directory markers, but it just appends a "/" to the directory
173173
name, so `mkdir(s3a://bucket/a/b)` will create a new marker object `a/b/` .
174174

175-
When a file is created under a path, the directory marker is deleted. And when a
176-
file is deleted, if it was the last file in the directory, the marker is
175+
In older versions of Hadoop, when a file was created under a path,
176+
the directory marker is deleted. And when a file is deleted,
177+
if it was the last file in the directory, the marker is
177178
recreated.
178179

180+
This release does not delete directory markers.
181+
179182
And, historically, when a path is listed, if a marker to that path is found, *it
180183
has been interpreted as an empty directory.*
181184

@@ -247,8 +250,6 @@ directory markers when creating files under paths. This removes all scalability
247250
problems caused by deleting these markers -however, it is achieved at the expense
248251
of backwards compatibility.
249252

250-
## <a name="marker-retention"></a> Controlling marker retention with `fs.s3a.directory.marker.retention`
251-
252253
There is now an option `fs.s3a.directory.marker.retention` which controls how
253254
markers are managed when new files are created
254255

@@ -264,32 +265,15 @@ The setting, `fs.s3a.directory.marker.retention = delete` is compatible with
264265
every shipping Hadoop release; that of `keep` compatible with
265266
all releases since 2021.
266267

267-
## <a name="s3guard"></a> Directory Markers and Authoritative paths
268-
269-
270-
The now-deleted S3Guard feature included the concept of "authoritative paths";
271-
paths where all clients were required to be using S3Guard and sharing the
272-
same metadata store.
273-
In such a setup, listing authoritative paths would skip all queries of the S3
274-
store -potentially being much faster.
268+
### Hadoop 3.4.0: markers are not deleted by default
275269

276-
In production, authoritative paths were usually only ever for Hive managed
277-
tables, where access was strictly restricted to the Hive services.
270+
[HADOOP-18752](https://issues.apache.org/jira/browse/HADOOP-18752)
271+
_Change fs.s3a.directory.marker.retention to "keep"_ changed the default
272+
policy.
278273

274+
Marker deletion can still be enabled.
279275

280-
When the S3A client is configured to treat some directories as "Authoritative"
281-
then an S3A connector with a retention policy of `fs.s3a.directory.marker.retention` of
282-
`authoritative` will omit deleting markers in authoritative directories.
283-
284-
```xml
285-
<property>
286-
<name>fs.s3a.bucket.hive.authoritative.path</name>
287-
<value>/tables</value>
288-
</property>
289-
```
290-
This an option to consider if not 100% confident that all
291-
applications interacting with a store are using an S3A client
292-
which is marker aware.
276+
### Hadoop 3.5.x: marker deletion is no longer supported.
293277

294278
## <a name="bucket-info"></a> Verifying marker policy with `s3guard bucket-info`
295279

@@ -306,7 +290,6 @@ line of bucket policies via the `-marker` option
306290

307291
All releases of Hadoop which have been updated to be marker aware will support the `-markers aware` option.
308292

309-
310293
1. Updated releases which do not support switching marker retention policy will also support the
311294
`-markers delete` option.
312295

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ This auditing information can be used to identify opportunities to reduce load.
132132
Prints and optionally checks the status of a bucket.
133133

134134
```bash
135-
hadoop s3guard bucket-info [-fips] [-magic] [-encryption ENCRYPTION] s3a://BUCKET
135+
hadoop s3guard bucket-info [-fips] [-magic] [-encryption ENCRYPTION] [-markers MARKER] s3a://BUCKET
136136
```
137137

138138
Options
@@ -141,6 +141,7 @@ Options
141141
|----------------------|---------------------------------------------------------------------|
142142
| `-fips` | Require FIPS endopint to be in use |
143143
| `-magic` | Require the S3 filesystem to be support the "magic" committer |
144+
| `-markers` | Directory marker status: `aware`, `keep` |
144145
| `-encryption <type>` | Require a specific encryption algorithm |
145146

146147
The server side encryption options are not directly related to S3Guard, but
@@ -171,10 +172,7 @@ S3A Committers
171172
Security
172173
Delegation token support is disabled
173174

174-
Directory Markers
175-
The directory marker policy is "keep"
176-
Available Policies: delete, keep, authoritative
177-
Authoritative paths: fs.s3a.authoritative.path=
175+
This version of Hadoop always retains directory markers
178176

179177
```
180178

0 commit comments

Comments
 (0)