Skip to content

HADOOP-19793. S3A: use long for file size in S3A content providers, data blocks#8225

Open
ajfabbri wants to merge 3 commits intoapache:trunkfrom
ajfabbri:fabbri/hadoop-19793
Open

HADOOP-19793. S3A: use long for file size in S3A content providers, data blocks#8225
ajfabbri wants to merge 3 commits intoapache:trunkfrom
ajfabbri:fabbri/hadoop-19793

Conversation

@ajfabbri
Copy link
Contributor

@ajfabbri ajfabbri commented Feb 2, 2026

Description of PR

From HADOOP-19793:

In HADOOP-19221 the max size of a single block was made an integer, even if the source is a file > 2GB long. This means that uploads as a single block no longer work. This is relevant when working with stores like GCS which don't support multipart uploads.

How was this patch tested?

 mvn -Dtest=none -Dit.test="ITestS3AHugeFilesNoMultipart" -Dscale \
    -Dfs.s3a.scale.test.huge.filesize=3G verify

Tested that large scale test with localstack S3.

Ran all integration tests locally against S3 in us-west-2. All passed except:

  1. ITestS3ACannedACLs>AbstractS3ATestBase.setup:111->AbstractFSContractTestBase.setup:197->AbstractFSContractTestBase.mkdirs:355 » ... S3Exception: The bucket does not allow ACLs
  2. ITestS3ATemporaryCredentials.testSessionTokenPropagation:202 getFileStatus on s3a://fabbri-s3a/job-00-fork-0004/test/testSTS/c20306d0-3eca-4264-b210-3d7beb8c80c7: software.amazon.awssdk.services.s3.model.S3Exception: Forbidden (Service: S3, Status Code: 403, Request ID: 6GV4SMMX9EXVDB0J, Extended Request ID: nB4bQ+CHn+Y2BiR4lLSgLtHyWVd05eerIXrsNDhV9Z0o9I/KZ+N8VD3khlSsiJvh0HkuCjDMtB4=):null
  3. `Run 3: ITestS3ACommitterMRJob.test_200_execute:313->Assertions.fail:138 Job job_1771452714166_0003 failed in state FAILED with cause Application application_1771452714166_0003 failed 2 times due to AM Container for appattempt_1771452714166_0003_000002 exited with exitCode: 1

The first failure seems OK given I haven't enabled bucket ACLs.
I'm not sure what the problem is with 2 & 3.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • [na] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [na] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

If an AI tool was used: Nope

@ajfabbri
Copy link
Contributor Author

Majority of CI failures due to HADOOP-19790

@ajfabbri ajfabbri force-pushed the fabbri/hadoop-19793 branch 2 times, most recently from d0403b1 to 1e1c09c Compare February 18, 2026 21:36
@ajfabbri ajfabbri changed the title WIP: HADOOP-19793 use long for file size in S3A content providers, data blocks HADOOP-19793 use long for file size in S3A content providers, data blocks Feb 18, 2026
@ajfabbri ajfabbri marked this pull request as ready for review February 18, 2026 22:33
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 50m 23s trunk passed
+1 💚 compile 1m 5s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 5s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 0m 59s trunk passed
+1 💚 mvnsite 1m 12s trunk passed
+1 💚 javadoc 0m 58s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 0m 56s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
-1 ❌ spotbugs 1m 43s /branch-spotbugs-hadoop-tools_hadoop-aws-warnings.html hadoop-tools/hadoop-aws in trunk has 2 extant spotbugs warnings.
+1 💚 shadedclient 34m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 41s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 0m 37s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 28s the patch passed
+1 💚 mvnsite 0m 42s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
-1 ❌ spotbugs 1m 25s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 2 fixed = 1 total (was 2)
+1 💚 shadedclient 33m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 31s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
139m 6s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
Unknown bug pattern MC_OVERRIDABLE_METHOD_CALL_IN_READ_OBJECT in org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.readObject(ObjectInputStream) At SinglePendingCommit.java:org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.readObject(ObjectInputStream) At SinglePendingCommit.java:[line 201]
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/6/artifact/out/Dockerfile
GITHUB PR #8225
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux b60e0a72e150 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / dfa73c6
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/6/testReport/
Max. process+thread count 575 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/6/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran changed the title HADOOP-19793 use long for file size in S3A content providers, data blocks HADOOP-19793. S3A: use long for file size in S3A content providers, data blocks Feb 19, 2026
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

all looks good, and I reviewed that SingleFilePendingCommit file size too.

Regarding test failures
#2 may mean you aren't set up to create a session for the target user/account. I can help there
#3 things have been playing up with the MR cluster tests since the move to JUnit 5...getting anything working was a challenge enough.

@steveloughran
Copy link
Contributor

fix the spotbugs by addressing overrideable methods in verify, or make SinglePendingCommit final.

@ajfabbri ajfabbri force-pushed the fabbri/hadoop-19793 branch from 587e5de to eed4c14 Compare February 19, 2026 20:53
@steveloughran
Copy link
Contributor

  1. a trivial checkstyle to fix
  2. spotbugs needs to be told to shut up, which can be done with a new entry in hadoop-tools/hadoop-aws/dev-support/findbugs-exclude.xml

@apache apache deleted a comment from hadoop-yetus Feb 20, 2026
@ajfabbri
Copy link
Contributor Author

ajfabbri commented Feb 20, 2026

a trivial checkstyle to fix

I already fixed it but CI has been stuck for over 24 hours! https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/8/

@ajfabbri ajfabbri force-pushed the fabbri/hadoop-19793 branch from eed4c14 to f94a921 Compare February 21, 2026 00:29
@ajfabbri
Copy link
Contributor Author

Force-push: rebase on latest trunk.

@ajfabbri ajfabbri force-pushed the fabbri/hadoop-19793 branch from f94a921 to 01e1456 Compare February 24, 2026 21:43
@apache apache deleted a comment from hadoop-yetus Feb 24, 2026
@github-actions github-actions bot added the ABFS label Feb 24, 2026
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah, I don't see what spotbugs is complaining about here...I suspect it's the duplicate Class declaration in the same match

<!-- Despite adding `final` as suggested, spotbugs kept complaining. -->
<Match>
<Class name="org.apache.hadoop.fs.s3a.commit.files.PendingSet"/>
<Class name="org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you have to only do one Class per match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea. I tried it. 🤷‍♂️

@ajfabbri ajfabbri force-pushed the fabbri/hadoop-19793 branch 2 times, most recently from 7945e63 to c7640dc Compare February 25, 2026 16:07
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 46s Maven dependency ordering for branch
+1 💚 mvninstall 56m 35s trunk passed
+1 💚 compile 1m 59s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 59s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 25s trunk passed
+1 💚 mvnsite 2m 29s trunk passed
+1 💚 javadoc 1m 55s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 57s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
-1 ❌ spotbugs 2m 3s /branch-spotbugs-hadoop-tools_hadoop-aws-warnings.html hadoop-tools/hadoop-aws in trunk has 2 extant spotbugs warnings.
+1 💚 shadedclient 35m 27s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 1m 21s the patch passed
+1 💚 compile 1m 31s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 1m 31s the patch passed
+1 💚 compile 1m 36s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 1m 36s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 4s the patch passed
+1 💚 mvnsite 1m 29s the patch passed
+1 💚 javadoc 1m 6s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 8s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 33s hadoop-azure in the patch passed.
+1 💚 spotbugs 1m 44s hadoop-tools/hadoop-aws generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2)
+1 💚 shadedclient 34m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 30s hadoop-azure in the patch passed.
+1 💚 unit 4m 3s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
169m 0s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/16/artifact/out/Dockerfile
GITHUB PR #8225
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 9ed7ab5a7668 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c7640dc
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/16/testReport/
Max. process+thread count 574 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure hadoop-tools/hadoop-aws U: hadoop-tools
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/16/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@ajfabbri
Copy link
Contributor Author

I am at a loss why CI (spotbugs) is failing here.

It passes for me locally. That is, hadoop-tools/hadoop-aws/target/spotbugs.xml contains zero errors when I run mvnd -pl hadoop-tools/hadoop-aws spotbugs:spotbugs:

<?xml version='1.0' encoding='UTF-8'?>
<BugCollection version='4.9.7' threshold='medium' effort='max'>
  <Error></Error>
  <Project>
    <SrcDir>/Users/fabbri/Code/hadoop/hadoop-tools/hadoop-aws/src/main/java</SrcDir>
    <SrcDir>/Users/fabbri/Code/hadoop/hadoop-tools/hadoop-aws/src/test/java</SrcDir>
  </Project>
</BugCollection>

Adding `final` didn't work so I had to add an exclude.

Experimental Warnings: MC_OVERRIDABLE_METHOD_CALL_IN_READ_OBJECT
 in o.a.h.fs.s3a.commit.files.PendingSet.readObject(ObjectInputStream)
and
In method o.a.h.fs.s3a.commit.files.SinglePendingCommit.readObject(ObjectInputStream)
Called method o.a.h.fs.s3a.commit.files.SinglePendingCommit.validate()

s3a: spotbugs: make entire class final

Despite changing validate() to final, still get this warning:

Unknown bug pattern MC_OVERRIDABLE_METHOD_CALL_IN_READ_OBJECT in
org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.readObject(ObjectInputStream)
At
SinglePendingCommit.java:org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.readObject(ObjectInputStream)
At SinglePendingCommit.java:[line 201]

try making the whole class final then.

hadoop-aws: add excludes for spotbugs being buggy
Supresses an existing warning that my edit re-triggered.

Disabled this check for all of hadoop-aws--it is of questionable value.
@ajfabbri ajfabbri force-pushed the fabbri/hadoop-19793 branch from c7640dc to f32ae58 Compare February 26, 2026 22:11
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@apache apache deleted a comment from hadoop-yetus Feb 26, 2026
@ajfabbri
Copy link
Contributor Author

Force-push: rebase on latest trunk

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 27s Maven dependency ordering for branch
+1 💚 mvninstall 51m 38s trunk passed
+1 💚 compile 1m 54s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 58s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 25s trunk passed
+1 💚 mvnsite 2m 17s trunk passed
+1 💚 javadoc 1m 57s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 54s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
-1 ❌ spotbugs 1m 42s /branch-spotbugs-hadoop-tools_hadoop-aws-warnings.html hadoop-tools/hadoop-aws in trunk has 2 extant spotbugs warnings.
+1 💚 shadedclient 34m 10s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 24s the patch passed
+1 💚 compile 1m 27s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 1m 27s the patch passed
+1 💚 compile 1m 28s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 1m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 59s the patch passed
+1 💚 mvnsite 1m 24s the patch passed
+1 💚 javadoc 1m 0s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 1s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 17s hadoop-azure in the patch passed.
+1 💚 spotbugs 1m 24s hadoop-tools/hadoop-aws generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2)
+1 💚 shadedclient 33m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 11s hadoop-azure in the patch passed.
+1 💚 unit 3m 33s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
158m 52s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/17/artifact/out/Dockerfile
GITHUB PR #8225
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 29e1127272a8 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f32ae58
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/17/testReport/
Max. process+thread count 646 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure hadoop-tools/hadoop-aws U: hadoop-tools
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/17/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

hadoop-tools/hadoop-aws generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2)

you've fixed the spotbugs...those two were "extant" in the existing code. They probably crept in from a spotbugs update.

merge at your leisure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants