-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-17250 Lot of short reads can be merged with readahead. #3110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17250 Lot of short reads can be merged with readahead. #3110
Conversation
CC @mehakmeet @snvijaya @bilaharith Please review this. #2307 was already review long time ago. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not look at this in any detail yet; made a few minor comments.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Show resolved
Hide resolved
...op-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java
Show resolved
Hide resolved
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamContext.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes to production code except some comment suggestions and one proposed variable just to make it slightly easer to read a very complex piece of code. Having got the S3A one wrong, I need these comments.
Tests all good too
+1 pending these
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Show resolved
Hide resolved
...re/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractSeek.java
Show resolved
Hide resolved
...re/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractSeek.java
Show resolved
Hide resolved
...re/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractSeek.java
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me; merge at your leisure
// fCursor is the current file pointer. Thus maximum we can | ||
// go back and read from buffer is fCursor - limit. | ||
// There maybe case that we read less than requested data. | ||
long bytesPresentInBuffer = fCursor - limit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename bytesPresentInBuffer to filePosAtStartOfBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have just a a minor comment on a variable rename.
Thanks for the patch @mukund-thakur. LGTM.
e81dc7b
to
0ba21aa
Compare
💔 -1 overall
This message was automatically generated. |
Introducing fs.azure.readahead.range parameter which can be set by user. Data will be populated in buffer for random reads as well which leads to lesser remote calls. This patch also changes the seek implementation to perform a lazy seek. Actual seek is done when a read is initiated and data is not present in buffer else date is returned from buffer thus reducing the number of remote calls.
0ba21aa
to
5d44ac5
Compare
🎊 +1 overall
This message was automatically generated. |
Introducing fs.azure.readahead.range parameter which can be set by the user. Data will be populated in buffer for random reads as well which leads to fewer remote calls. This patch also changes the seek implementation to perform a lazy seek. The actual seek is done when a read is initiated and data is not present in the buffer else data is returned from the buffer thus reducing the number of remote storage calls. Contributed By: Mukund Thakur Change-Id: Ib920eedd0087caa150afa4d4c23e89df56b29e83
🎊 +1 overall
This message was automatically generated. |
…#3110) Introducing fs.azure.readahead.range parameter which can be set by the user. Data will be populated in buffer for random reads as well which leads to fewer remote calls. This patch also changes the seek implementation to perform a lazy seek. The actual seek is done when a read is initiated and data is not present in the buffer else data is returned from the buffer thus reducing the number of remote storage calls. Contributed By: Mukund Thakur
…ad. (apache#3110) Introducing fs.azure.readahead.range parameter which can be set by the user. Data will be populated in buffer for random reads as well which leads to fewer remote calls. This patch also changes the seek implementation to perform a lazy seek. The actual seek is done when a read is initiated and data is not present in the buffer else data is returned from the buffer thus reducing the number of remote storage calls. Contributed By: Mukund Thakur Conflicts: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamContext.java hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsInputStreamStatistics.java hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/TestAbfsConfigurationFieldsValidation.java Change-Id: I89d44ee72bf65c410a3e72fb7cce15c545d9de41
Introducing fs.azure.readahead.range parameter which can be set by user.
Data will be populated in buffer for random reads as well which leads to lesser
remote calls.
This patch also changes the seek implementation to perform a lazy seek. Actual
seek is done when a read is initiated and data is not present in buffer else
date is returned from buffer thus reducing the number of remote calls.
Rebased with trunk. Base patch is #2307
Ran all tests including scale ones using us-east my bucket. All good.