Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size #6270

Merged
merged 9 commits into from
Jan 3, 2024

Conversation

anujmodi2021
Copy link
Contributor

@anujmodi2021 anujmodi2021 commented Nov 13, 2023

https://issues.apache.org/jira/browse/HADOOP-18971

Description of PR

Footer Read Optimization was introduced to Hadoop azure in this Jira: https://issues.apache.org/jira/browse/HADOOP-17347
and was kept disabled by default.
This PR is to enable footer reads by default based on the results of analysis performed as below:

In our scale workload analysis, it was found that workloads working with Parquet (or for that matter OCR etc.) have a lot of footer reads. Footer reads here refers to the read operations done by workload to get the metadata of the parquet file which is required to understand where the actual data resides in the parquet.
This whole process takes place in 3 steps:

  1. Workload reads the last 8 bytes of parquet file to get the offset and size of the metadata which is present just above these 8 bytes.
  2. Using that offset, workload reads the metadata to get the exact offset and length of data which it wants to read.
  3. Workload performs the final read operation to get the data it wants to use for its purpose.

Here the first two steps are metadata reads that can be combined into a single footer read. When workload tries to read certain last few bytes of data (let's say this value is footer size), driver will intelligently read some extra bytes above the footer size to cater to the next read which is going to come.

Q. What is the footer size of file?
A: 16KB. Any read request trying to get the data within last 16KB of the file will qualify for whole footer read. This value is enough to cater to all types of files including parquet, OCR, etc.

Q. What is the buffer size to read when reading the footer?
A. Let's call this footer read buffer size. Prior to this PR footer read buffer size was same as read buffer size (default 4MB). It was found that for most of the workload required footer size was only 256KB. i.e. For almost all parquet files metadata for that file was found to be within last 256KBs. Keeping this in mind it does not make sense to read whole buffer length of 4MB as a part of footer read. Moreover, reading larger data than require incur additional costs in terms of server and network latencies. Based on this and extensive experimentation it was observed that footer read buffer size of 512KB is ideal for almost all the workloads running on parquet, OCR, etc.

Following configuration was introduced to configure the footer read buffer size:
fs.azure.footer.read.request.size: default 512 KB.

Quantitative Stats: For a workload running on parquet files with above optimization enabled, the number of read requests got reduced by 2.3M down from 20M. That means around 10% reduction in overall TPS.

How was this patch tested?

Test suite was run and no failures were found.
Existing tests were modified to cover the production code changes.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@@ -190,7 +193,8 @@ private void seekReadAndTest(final FileSystem fs, final Path testFilePath,
try (FSDataInputStream iStream = fs.open(testFilePath)) {
AbfsInputStream abfsInputStream = (AbfsInputStream) iStream
.getWrappedStream();
long bufferSize = abfsInputStream.getBufferSize();
long footerReadBufferSize = abfsInputStream.getFooterReadBufferSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on diff sizes of file. Should we have parameterized values for getFooterReadBufferSize. Right now, it depends on what test-config developer has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not able to get this...
Can you please elaborate...

The footer buffer size here will be the default one unless user sets it in configs explicitly.
Are you recommending this to be hardcoded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of this config is 256KB. Now, developer can have any other config also. Right now, test is very much inline of using 256 KB. What I am proposing is, that in the test, we set the config and don't depend on the dev given config. Plus, I am proposing we run this test for different values of footerBufferSize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose something else, will comment below

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 48m 3s trunk passed
+1 💚 compile 0m 37s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 28s trunk passed
+1 💚 mvnsite 0m 38s trunk passed
+1 💚 javadoc 0m 35s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 31s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s trunk passed
+1 💚 shadedclient 39m 10s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 18s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 6 new + 2 unchanged - 0 fixed = 8 total (was 2)
+1 💚 mvnsite 0m 28s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 4s the patch passed
+1 💚 shadedclient 38m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 2s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
141m 29s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/1/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux c4ae513e19be 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5af02c5
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/1/testReport/
Max. process+thread count 532 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 41s trunk passed
+1 💚 compile 0m 40s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 35s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 33s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 39s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 6s trunk passed
+1 💚 shadedclient 36m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 31s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 22s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 8 new + 2 unchanged - 0 fixed = 10 total (was 2)
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 27s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 7s the patch passed
+1 💚 shadedclient 36m 21s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 1s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
137m 20s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/2/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 93c32e743192 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 14214f0
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/2/testReport/
Max. process+thread count 692 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021 anujmodi2021 changed the title Code Changes to enable footer optimization with new buffer size HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size Nov 13, 2023
@anujmodi2021
Copy link
Contributor Author


:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsFileSystemContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote:259->AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure:334->AbstractContractDistCpTest.assertCounterInRange:294->Assert.assertTrue:42->Assert.fail:89 Files Copied value 2 above maximum 1
[INFO]
[ERROR] Tests run: 339, Failures: 1, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 591, Failures: 0, Errors: 0, Skipped: 274
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

Time taken: 25 mins 53 secs.

@anujmodi2021 anujmodi2021 marked this pull request as ready for review November 13, 2023 17:06
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 27s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 38s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 43s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 37s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 37m 3s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 19s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 4 new + 2 unchanged - 0 fixed = 6 total (was 2)
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 26s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s the patch passed
+1 💚 shadedclient 38m 3s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 2s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
139m 39s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/3/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux b9aa07842398 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e438b94
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/3/testReport/
Max. process+thread count 554 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to propose that you add the option as a paramter to the openFile() builder API. This will

  • allow applications to explicitly set it (unlikely given parquet is trapped compiling against hadoop 2.8)
  • make it trival to write tests which open files of different footer length, including the stress test cases (4 bytes, 0 bytes...)

@@ -104,7 +104,20 @@ public final class ConfigurationKeys {
public static final String AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = "fs.azure.write.enableappendwithflush";
public static final String AZURE_READ_BUFFER_SIZE = "fs.azure.read.request.size";
public static final String AZURE_READ_SMALL_FILES_COMPLETELY = "fs.azure.read.smallfilescompletely";
/**
* When parquet files are read, first few read are metadata reads before reading the actual data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is roughly the same for ORC, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the procedure is same for ORC as well...
This optimization will impact workloads running on both parquet and ORC files

/**
* When parquet files are read, first few read are metadata reads before reading the actual data.
* First the read is done of last 8 bytes of parquet file to get the postion of metadta and next read
* is done for reading that metadata. With this optimizations these two reads can be combined into 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit "optimization"

@@ -59,7 +59,8 @@ public final class FileSystemConfigurations {
public static final boolean DEFAULT_AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = false;
public static final int DEFAULT_READ_BUFFER_SIZE = 4 * ONE_MB; // 4 MB
public static final boolean DEFAULT_READ_SMALL_FILES_COMPLETELY = false;
public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = false;
public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = true;
public static final int DEFAULT_FOOTER_READ_BUFFER_SIZE = 512 * ONE_KB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 512k; docs in file above say 265K.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doc.
So 256KB is enough data but to be on safer side we want to read more. It was found that 512 KB is a sweet spot between caching as much data as possible without impacting performance as much as possible.

public static final String AZURE_READ_OPTIMIZE_FOOTER_READ = "fs.azure.read.optimizefooterread";
/**
* In case of footer reads it was not required to read full buffer size.
* Most of the metadata information required was within 256KB and it will be more performant to read lesser.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"read less"

@@ -190,7 +193,8 @@ private void seekReadAndTest(final FileSystem fs, final Path testFilePath,
try (FSDataInputStream iStream = fs.open(testFilePath)) {
AbfsInputStream abfsInputStream = (AbfsInputStream) iStream
.getWrappedStream();
long bufferSize = abfsInputStream.getBufferSize();
long footerReadBufferSize = abfsInputStream.getFooterReadBufferSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose something else, will comment below

@@ -358,6 +360,7 @@ private int readFileCompletely(final byte[] b, final int off, final int len)
return optimisedRead(b, off, len, 0, contentLength);
}

// To do footer read of files when enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: . at end for javadoc

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 45m 20s trunk passed
+1 💚 compile 0m 36s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 32s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 39s trunk passed
+1 💚 javadoc 0m 38s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s trunk passed
+1 💚 shadedclient 34m 31s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 29s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 19s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 8 new + 2 unchanged - 0 fixed = 10 total (was 2)
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 26s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 7s the patch passed
+1 💚 shadedclient 34m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 59s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
130m 17s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/4/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux bc2d8f03a8ba 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c0c1319
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/4/testReport/
Max. process+thread count 672 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021
Copy link
Contributor Author


:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 591, Failures: 0, Errors: 0, Skipped: 274
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

Time taken: 27 mins 14 secs.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 49m 3s trunk passed
+1 💚 compile 0m 37s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 38s trunk passed
+1 💚 javadoc 0m 36s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 2s trunk passed
+1 💚 shadedclient 37m 56s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 27s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 19s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 10 new + 2 unchanged - 0 fixed = 12 total (was 2)
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 2s the patch passed
+1 💚 shadedclient 37m 44s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 13s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
140m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/5/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 37ce969fe51e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6bb6bd7
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/5/testReport/
Max. process+thread count 532 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saxenapranav saxenapranav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests getting into better shape. Some points:

  1. the default value comment
  2. There are some tests which follow builder route and some follow config route, can we have a way wherein a given test run once for builder and once for config.

@@ -811,13 +813,17 @@ private AbfsInputStreamContext populateAbfsInputStreamContext(
boolean bufferedPreadDisabled = options
.map(c -> c.getBoolean(FS_AZURE_BUFFERED_PREAD_DISABLE, false))
.orElse(false);
int footerReadBufferSize = options.map(c -> c.getInt(
AZURE_FOOTER_READ_BUFFER_SIZE, DEFAULT_FOOTER_READ_BUFFER_SIZE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case if dev goes with builder route and have not given fs.azure.footer.read.request.size in builder opts, it will take the default value even though in config a value is defined. Better in place of default value, have the abfsConfugration.getFooterReadBufferSize().

How can you verify this happens, change bit of your testNumBackendCalls code:

private void testNumBackendCalls(boolean optimizeFooterRead)
      throws Exception {
    for (int i = 0; i <= 4; i++) {
      for (int j = 0; j <= 2; j++) {
        int fileSize = (int) Math.pow(2, i) * 256 * ONE_KB;
        int footerReadBufferSize = (int) Math.pow(2, j) * 256 * ONE_KB;
        final AzureBlobFileSystem fs = getFileSystem(optimizeFooterRead,
            fileSize, footerReadBufferSize);
        String fileName = methodName.getMethodName() + i;
        byte[] fileContent = getRandomBytesArray(fileSize);
        Path testFilePath = createFileWithContent(fs, fileName, fileContent);
        int length = AbfsInputStream.FOOTER_SIZE;
        FutureDataInputStreamBuilder builder = fs.openFile(testFilePath);
//        builder.opt(ConfigurationKeys.AZURE_FOOTER_READ_BUFFER_SIZE,
//            footerReadBufferSize);
        try (FSDataInputStream iStream = builder.build().get()) {
......
```

Lets add this usecase as well in tests.

Copy link
Contributor

@saxenapranav saxenapranav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all look good. just one thing.


// Verify that default value is used if nothing is set explicitly
FSDataInputStream iStream = fs.open(testFilePath);
verifyConfigValueInStream(iStream, DEFAULT_FOOTER_READ_BUFFER_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a check if there is a config in test-config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 47m 19s trunk passed
+1 💚 compile 0m 36s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 38s trunk passed
+1 💚 javadoc 0m 36s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s trunk passed
+1 💚 shadedclient 37m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 29s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 18s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 9 new + 2 unchanged - 0 fixed = 11 total (was 2)
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 1s the patch passed
+1 💚 shadedclient 38m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 12s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
138m 28s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/6/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux c1238a49b4ab 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a7597ed
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/6/testReport/
Max. process+thread count 579 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021
Copy link
Contributor Author

@steveloughran @mukund-thakur @mehakmeet
Gentle reminder for reviewing this PR.
Thanks

Copy link
Contributor

@saxenapranav saxenapranav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking suggestions. Look good.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 33s trunk passed
+1 💚 compile 0m 39s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 39s trunk passed
+1 💚 javadoc 0m 36s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s trunk passed
+1 💚 shadedclient 37m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 19s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 9 new + 2 unchanged - 0 fixed = 11 total (was 2)
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 3s the patch passed
+1 💚 shadedclient 38m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 12s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
137m 57s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/7/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 5c896dd04fdc 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2b2ee8e
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/7/testReport/
Max. process+thread count 606 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 29s trunk passed
+1 💚 compile 0m 34s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 35s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 39s trunk passed
+1 💚 javadoc 0m 36s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 1s trunk passed
+1 💚 shadedclient 37m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 29s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 19s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 9 new + 2 unchanged - 0 fixed = 11 total (was 2)
+1 💚 mvnsite 0m 28s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 7s the patch passed
+1 💚 shadedclient 38m 56s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 57s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
138m 34s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/8/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 6fa94ffe7462 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 85a4d62
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/8/testReport/
Max. process+thread count 631 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
+1

@steveloughran
Copy link
Contributor

ok, you need to merge in trunk and it'll be good to go.

Can you also add as a comment what the explicit text you want in the commit message. I'm thinking something like

HADOOP-17347. ABFS: read/cache footer with fs.azure.footer.read.request.size

The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 is good for parquet

Contributed by ...

side issue, I'm actually thinking of adding cross-stream caching in s3a fs, so that threads working on different row chunks can share footers. Some small LRU cache, with IOStats collection to help measure its value.

thoughts?

@anujmodi2021
Copy link
Contributor Author

ok, you need to merge in trunk and it'll be good to go.

Can you also add as a comment what the explicit text you want in the commit message. I'm thinking something like

HADOOP-17347. ABFS: read/cache footer with fs.azure.footer.read.request.size

The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 is good for parquet

Contributed by ...

side issue, I'm actually thinking of adding cross-stream caching in s3a fs, so that threads working on different row chunks can share footers. Some small LRU cache, with IOStats collection to help measure its value.

thoughts?

Thanks for the review @steveloughran .
I have resolved merge conflicts and validated the changes by running the test suite again.
The commit message you mentioned seems good. Slight changes can be done as following:

HADOOP-17347. ABFS: read/cache footer with fs.azure.footer.read.request.size

The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 is good for most analytic workloads running on parquet, ORC etc file formats.

Contributed by ...

@anujmodi2021
Copy link
Contributor Author

Regarding the cross-stream caching.
Its sounds really good.
I think not only footer reads but this can be expanded to other prefetches as well.
Especially small files that are read fully can be cached such that multiple streams can be catered to.

@anujmodi2021
Copy link
Contributor Author


:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 585, Failures: 0, Errors: 0, Skipped: 266
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41

Time taken: 25 mins 3 secs.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 47m 19s trunk passed
+1 💚 compile 0m 48s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 28s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 38s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 6s trunk passed
+1 💚 shadedclient 37m 53s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 38m 14s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 19s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 8 new + 2 unchanged - 0 fixed = 10 total (was 2)
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 javadoc 0m 26s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 4s the patch passed
+1 💚 shadedclient 37m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 13s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
139m 12s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/9/artifact/out/Dockerfile
GITHUB PR #6270
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ec31e83480ab 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 66b035c
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/9/testReport/
Max. process+thread count 530 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/9/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

I think not only footer reads but this can be expanded to other prefetches as well.
Especially small files that are read fully can be cached such that multiple streams can be catered to.

i don't know how common that use is...whereas for spark/tez and workers, reopening the same file is not unusual -they just process different parts.

I think this is why prefecting doesn't do anything for orc/parquet. Note that impala does cache the column indexes/page indexes so it doesn't need the filesystem to secretly do it for them.

@anujmodi2021
Copy link
Contributor Author

I think not only footer reads but this can be expanded to other prefetches as well.
Especially small files that are read fully can be cached such that multiple streams can be catered to.

i don't know how common that use is...whereas for spark/tez and workers, reopening the same file is not unusual -they just process different parts.

I think this is why prefecting doesn't do anything for orc/parquet. Note that impala does cache the column indexes/page indexes so it doesn't need the filesystem to secretly do it for them.

Yes, we have also had similar observations. But I feel this cross-stream caching is a good idea for both footer reads, and small files read.

@steveloughran steveloughran merged commit e3c135b into apache:trunk Jan 3, 2024
4 checks passed
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
…read.request.size (apache#6270)


The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 has been measured to
be good for most workloads running on parquet, ORC and similar file formats.

Contributed by Anuj Modi
@anujmodi2021 anujmodi2021 deleted the footerOptimization branch April 17, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants