You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md
+31-31Lines changed: 31 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,36 +22,36 @@ A high level overview of this feature was published in
22
22
[Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0).
23
23
24
24
With prefetching, the input stream divides the remote file into blocks of a fixed size, associates
25
-
buffers to these blocks and then reads data into these buffers asynchronously.
25
+
buffers to these blocks and then reads data into these buffers asynchronously.
26
26
It also potentially caches these blocks.
27
27
28
28
### Basic Concepts
29
29
30
30
***Remote File**: A binary blob of data stored on some storage device.
31
31
***Block File**: Local file containing a block of the remote file.
32
-
***Block**: A file is divided into a number of blocks.
32
+
***Block**: A file is divided into a number of blocks.
33
33
The size of the first n-1 blocks is same, and the size of the last block may be same or smaller.
34
-
***Block based reading**: The granularity of read is one block.
35
-
That is, either an entire block is read and returned or none at all.
34
+
***Block based reading**: The granularity of read is one block.
35
+
That is, either an entire block is read and returned or none at all.
36
36
Multiple blocks may be read in parallel.
37
37
38
38
### Configuring the stream
39
39
40
40
|Property |Meaning |Default |
41
-
|---|---|---|
41
+
|---|---|---|
42
42
|`fs.s3a.prefetch.enabled`|Enable the prefetch input stream |`true`|
43
43
|`fs.s3a.prefetch.block.size`|Size of a block |`8M`|
44
44
|`fs.s3a.prefetch.block.count`|Number of blocks to prefetch |`8`|
45
45
46
46
### Key Components
47
47
48
48
`S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will return an instance of
49
-
this class as the input stream.
49
+
this class as the input stream.
50
50
Depending on the remote file size, it will either use
51
51
the `S3InMemoryInputStream` or the `S3CachingInputStream` as the underlying input stream.
52
52
53
53
`S3InMemoryInputStream` - Underlying input stream used when the remote file size < configured block
54
-
size.
54
+
size.
55
55
Will read the entire remote file into memory.
56
56
57
57
`S3CachingInputStream` - Underlying input stream used when remote file size > configured block size.
@@ -61,30 +61,30 @@ Uses asynchronous prefetching of blocks and caching to improve performance.
61
61
62
62
* Number of blocks in the remote file
63
63
* Block size
64
-
* State of each block (initially all blocks have state *NOT_READY*).
64
+
* State of each block (initially all blocks have state *NOT_READY*).
65
65
Other states are: Queued, Ready, Cached.
66
66
67
67
`BufferData` - Holds the buffer and additional information about it such as:
68
68
69
69
* The block number this buffer is for
70
-
* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done).
70
+
* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done).
71
71
Initial state of a buffer is blank.
72
72
73
73
`CachingBlockManager` - Implements reading data into the buffer, prefetching and caching.
74
74
75
-
`BufferPool` - Manages a fixed sized pool of buffers.
75
+
`BufferPool` - Manages a fixed sized pool of buffers.
76
76
It’s used by `CachingBlockManager` to acquire buffers.
77
77
78
78
`S3File` - Implements operations to interact with S3 such as opening and closing the input stream to
79
79
the remote file in S3.
80
80
81
-
`S3Reader` - Implements reading from the stream opened by `S3File`.
81
+
`S3Reader` - Implements reading from the stream opened by `S3File`.
82
82
Reads from this input stream in blocks of 64KB.
83
83
84
-
`FilePosition` - Provides functionality related to tracking the position in the file.
84
+
`FilePosition` - Provides functionality related to tracking the position in the file.
85
85
Also gives access to the current buffer in use.
86
86
87
-
`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system.
87
+
`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system.
88
88
Each cache block is stored on the local disk as a separate block file.
89
89
90
90
### Operation
@@ -101,8 +101,8 @@ in.read(buffer, 0, 3MB);
101
101
in.read(buffer, 0, 2MB);
102
102
```
103
103
104
-
When the first read is issued, there is no buffer in use yet.
105
-
The `S3InMemoryInputStream` gets the data in this remote file by calling the `ensureCurrentBuffer()`
104
+
When the first read is issued, there is no buffer in use yet.
105
+
The `S3InMemoryInputStream` gets the data in this remote file by calling the `ensureCurrentBuffer()`
106
106
method, which ensures that a buffer with data is available to be read from.
107
107
108
108
The `ensureCurrentBuffer()` then:
@@ -117,7 +117,7 @@ The `ensureCurrentBuffer()` then:
117
117
118
118
The read operation now just gets the required bytes from the buffer in `FilePosition`.
119
119
120
-
When the second read is issued, there is already a valid buffer which can be used.
120
+
When the second read is issued, there is already a valid buffer which can be used.
121
121
Don’t do anything else, just read the required bytes from this buffer.
122
122
123
123
#### S3CachingInputStream
@@ -134,7 +134,7 @@ in.read(buffer, 0, 5MB)
134
134
in.read(buffer, 0, 8MB)
135
135
```
136
136
137
-
For the first read call, there is no valid buffer yet.
137
+
For the first read call, there is no valid buffer yet.
138
138
`ensureCurrentBuffer()` is called, and for the first `read()`, prefetch count is set as 1.
139
139
140
140
The current block (block 0) is read synchronously, while the blocks to be prefetched (block 1) is
@@ -143,29 +143,29 @@ read asynchronously.
143
143
The `CachingBlockManager` is responsible for getting buffers from the buffer pool and reading data
144
144
into them. This process of acquiring the buffer pool works as follows:
145
145
146
-
* The buffer pool keeps a map of allocated buffers and a pool of available buffers.
147
-
The size of this pool is = prefetch block count + 1.
146
+
* The buffer pool keeps a map of allocated buffers and a pool of available buffers.
147
+
The size of this pool is = prefetch block count + 1.
148
148
If the prefetch block count is 8, the buffer pool has a size of 9.
149
149
* If the pool is not yet at capacity, create a new buffer and add it to the pool.
150
-
* If it’s at capacity, check if any buffers with state = done can be released.
151
-
Releasing a buffer means removing it from allocated and returning it back to the pool of available
150
+
* If it's at capacity, check if any buffers with state = done can be released.
151
+
Releasing a buffer means removing it from allocated and returning it back to the pool of available
152
152
buffers.
153
153
* If there are no buffers with state = done currently then nothing will be released, so retry the
154
154
above step at a fixed interval a few times till a buffer becomes available.
155
-
* If after multiple retries there are still no available buffers, release a buffer in the ready state.
155
+
* If after multiple retries there are still no available buffers, release a buffer in the ready state.
156
156
The buffer for the block furthest from the current block is released.
157
157
158
158
Once a buffer has been acquired by `CachingBlockManager`, if the buffer is in a *READY* state, it is
159
-
returned.
160
-
This means that data was already read into this buffer asynchronously by a prefetch.
161
-
If it’s state is *BLANK,* then data is read into it using
159
+
returned.
160
+
This means that data was already read into this buffer asynchronously by a prefetch.
161
+
If it's state is *BLANK* then data is read into it using
162
162
`S3Reader.read(ByteBuffer buffer, long offset, int size).`
163
163
164
164
For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and only 5MB
165
165
of block 0 has been read so far, 3MB of the required data will be read from the current block 0.
166
166
Once all data has been read from this block, `S3CachingInputStream` requests the next block (
167
-
block 1), which will already have been prefetched and so it can just start reading from it.
168
-
Also, while reading from block 1 it will also issue prefetch requests for the next blocks.
167
+
block 1), which will already have been prefetched and so it can just start reading from it.
168
+
Also, while reading from block 1 it will also issue prefetch requests for the next blocks.
169
169
The number of blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`.
170
170
171
171
##### Random Reads
@@ -180,13 +180,13 @@ in.seek(2MB)
180
180
in.read(buffer, 0, 4MB)
181
181
```
182
182
183
-
The `CachingInputStream` also caches prefetched blocks.
184
-
This happens when a `seek()` is issued for outside the current block and the current block still has
183
+
The `CachingInputStream` also caches prefetched blocks.
184
+
This happens when a `seek()` is issued for outside the current block and the current block still has
185
185
not been fully read.
186
186
187
187
For the above read sequence, when the `seek(10MB)` call is issued, block 0 has not been read
188
188
completely so cache it as the caller will probably want to read from it again.
189
189
190
-
When `seek(2MB)` is called, the position is back inside block 0.
191
-
The next read can now be satisfied from the locally cached block file, which is typically orders of
190
+
When `seek(2MB)` is called, the position is back inside block 0.
191
+
The next read can now be satisfied from the locally cached block file, which is typically orders of
0 commit comments