Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17928. Syncable: S3A to warn and downgrade #3585

Merged

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Oct 25, 2021

This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.

This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).

Contributed by Steve Loughran.

Change-Id: Id7f8fc65a32a5ab664ebbd68d92754d4c38e29d8

How was this patch tested?

New unit tests; cloud store test in progress

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?

This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.

This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).

Contributed by Steve Loughran.

Change-Id: Id7f8fc65a32a5ab664ebbd68d92754d4c38e29d8
@steveloughran
Copy link
Contributor Author

Testing in progress. FWIW, internally we had the core-default setting up for a while, because various obscure apps (ranger logs &c) were calling hflush and hsync, and while "stop it" is the correct long-term fix, that relies co-ordination across OSS release schedules. Which as we know, doesn't exist.

Change-Id: Ic9fb5026f13904bc4921946a75b9d807b2db326d
@steveloughran
Copy link
Contributor Author

Test result, all good except for the buffer incomplete issue which happens locally for me

[ERROR] Failures:
[ERROR]   ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferBeforeRead:63->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<515>
[INFO]
[ERROR] Tests run: 1471, Failures: 1, Errors: 0, Skipped: 302

@steveloughran
Copy link
Contributor Author

@sunchao
Copy link
Member

sunchao commented Oct 26, 2021

cc @dongjoon-hyun

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Pending that blanks fix caught by Yetus.

Copy link
Contributor

@bogthe bogthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @sunchao .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@steveloughran
Copy link
Contributor Author

thx, will fix the eof and apply

Change-Id: I61441b01f1aee342f3e59da7ef107b1dfc023e1d
@apache apache deleted a comment from hadoop-yetus Nov 2, 2021
@apache apache deleted a comment from hadoop-yetus Nov 2, 2021
@steveloughran
Copy link
Contributor Author

ok, as soon as yetus is done i will merge

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 42s Maven dependency ordering for branch
+1 💚 mvninstall 21m 20s trunk passed
+1 💚 compile 21m 46s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 18m 57s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 3m 37s trunk passed
+1 💚 mvnsite 2m 39s trunk passed
+1 💚 javadoc 1m 51s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 30s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 50s trunk passed
+1 💚 shadedclient 20m 52s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 1m 32s the patch passed
+1 💚 compile 20m 54s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 20m 54s the patch passed
+1 💚 compile 19m 1s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 19m 1s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 33s the patch passed
+1 💚 mvnsite 2m 37s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 49s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 30s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 4m 7s the patch passed
+1 💚 shadedclient 20m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 27s hadoop-common in the patch passed.
+1 💚 unit 2m 51s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
213m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3585/3/artifact/out/Dockerfile
GITHUB PR #3585
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint
uname Linux 961dbd7d163b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1bde2c9
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3585/3/testReport/
Max. process+thread count 2433 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3585/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit 6c6d1b6 into apache:trunk Nov 2, 2021
asfgit pushed a commit that referenced this pull request Nov 4, 2021
This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.

This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).

Contributed by Steve Loughran.

Change-Id: I0a02ec1e622343619f147f94158c18928a73a885
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.

This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).

Contributed by Steve Loughran.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants