HADOOP-13327. Specify Output Stream and Syncable #2102

steveloughran · 2020-06-26T16:31:21Z

successor to #1694

adds a spec doc for what we expect from an output stream
and especially Syncable hsync()/hflush(), which are critical for HBase and other apps with WALs to write
and adds checks for StreamCapabilities...those filesystems which implement the API now all declare that fact.

steveloughran · 2020-06-26T16:40:25Z

hi @busbey @DadanielZ -this is the rebased version of #1694. Could you two take a look at it and see if we can get it in. Not just because it says what we expect from hsync/hflush, but because we add the ability to probe a stream for having it before actually attempting the call. All output streams in our codebase which implement Syncable do the right thing here

steveloughran · 2020-06-29T12:59:00Z

rawlocal FS failure is because even though localfs.xml says hsync is supported, it's not being passed all the way down. I need a subclass of java.io.ByteBufferOutputStream to pass it through.

The good news, that comes in #2069; I'm going to change the xml test declarations until that is in

...-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/StoreImplementationUtils.java

joshelser · 2020-06-29T14:14:56Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md

-This MAY be a bug, as it allows >1 client to create a file with `overwrite==false`,
- and potentially confuse file/directory logic
+This is a significant difference between the behavior of object stores
+and that of filesystems, as it allows >1 client to create a file with `overwrite==false`,


nit: You used overwrite=false below. I'm assuming the double-equals was just copy-paste, but maybe correct to stay consistent?

changed to ==; also replace >1 with > to keep everything happy

joshelser · 2020-06-29T14:21:35Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md

+model permits the output stream to have `close()` invoked while awaiting an
+acknowledgement from datanode or namenode writes in an `hsync()` operation.
+
+### <a name="consistencyy"></a> Consistency and Visibility


"consistency"

steveloughran · 2020-09-21T16:48:08Z

@joshelser -rebased and addressed your minor points

I think you are right, we don't need separate HSYNC/HFLUSH options, it's just that they've been there (separately) for a while.

What to do?

maintain as is
add SYNCABLE which says "both"
Declare that HSYNC is sufficient and the only one you should look for. If you implement HFLUSH and not HSYNC, you are of no use to applications. retain support for HFLUSH probe, but in the spec only mention it as of historical interest.

I like option 3, now I think about it.

steveloughran · 2020-09-22T14:26:07Z

TestRawLocalContract create is failing until we do our own BufferedOutputStream in passthrough of HADOOP-16830/ #2323 .

the HDFS checksum ones are about checksums not being found, which implies they weren't being written. Which goes near output streams, doesn't it? Which means there is a risk this is a genuine regression and not "hdfs tests being flaky"

org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail to get block checksum for LocatedStripedBlock{BP-1893408133-172.17.0.3-1600724641689:blk_-9223372036854775792_1001; getBlockSize()=37748736; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:33131,DS-5c8811e5-39af-4967-b3a7-3b76f95b0317,DISK], DatanodeInfoWithStorage[127.0.0.1:38603,DS-82754ee5-286a-402b-9861-b3ebbc149849,DISK], DatanodeInfoWithStorage[127.0.0.1:44379,DS-080d6ce9-cb6a-4f80-b37a-e63ecc31d9bc,DISK], DatanodeInfoWithStorage[127.0.0.1:34723,DS-c6c2aa4e-639b-4d85-9564-05631c8c5b79,DISK], DatanodeInfoWithStorage[127.0.0.1:40061,DS-54eb3c16-b9ce-4a5d-a7a3-f33b635579b0,DISK], DatanodeInfoWithStorage[127.0.0.1:45525,DS-3c9ac850-9273-4d2d-933c-2ac3b4b30308,DISK], DatanodeInfoWithStorage[127.0.0.1:45537,DS-9b1d805c-3b0e-4c84-ad0e-f454817b6829,DISK], DatanodeInfoWithStorage[127.0.0.1:44697,DS-e06897c8-c8a5-41d6-b5bc-57c7339bbf9b,DISK]]; indices=[1, 2, 3, 4, 5, 6, 7, 8]}
	at org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
	at org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
	at org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
	at org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
	at org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1891)
	at org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1888)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1905)
	at org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
	at org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
	at org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery1(TestFileChecksum.java:312)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

steveloughran · 2020-09-24T09:42:08Z

Thinking:

single capability, syncable, everything maps to
skip local FS test until we get the IOStats patch in, then re-enable

steveloughran · 2020-10-19T17:57:51Z

HSYNC is the only option needed; tag HFLUSH as deprecated and in the docs say its not useful; outputstream.md doc lists the standard options.

hadoop-yetus · 2020-10-19T20:48:15Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	37m 31s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚		0m 0s	test4tests	The patch appears to include 10 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	25m 39s		Maven dependency ordering for branch
+1 💚	mvninstall	42m 8s		trunk passed
+1 💚	compile	21m 33s		trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	compile	17m 58s		trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚	checkstyle	2m 49s		trunk passed
+1 💚	mvnsite	5m 7s		trunk passed
+1 💚	shadedclient	24m 45s		branch has no errors when building and testing our client artifacts.
-1 ❌	javadoc	0m 47s	/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt	hadoop-hdfs-client in trunk failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
+1 💚	javadoc	4m 45s		trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗	spotbugs	0m 44s		Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	9m 53s		trunk passed
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 24s		Maven dependency ordering for patch
-1 ❌	mvninstall	0m 14s	/patch-mvninstall-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
-1 ❌	compile	19m 56s	/patch-compile-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt	root in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌	javac	19m 56s	/patch-compile-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt	root in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌	compile	17m 21s	/patch-compile-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt	root in the patch failed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.
-1 ❌	javac	17m 21s	/patch-compile-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt	root in the patch failed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.
-0 ⚠️	checkstyle	2m 45s	/buildtool-patch-checkstyle-root.txt	The patch fails to run checkstyle in root
-1 ❌	mvnsite	0m 28s	/patch-mvnsite-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
-1 ❌	whitespace	0m 0s	/whitespace-eol.txt	The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	xml	0m 5s		The patch has no ill-formed XML file.
+1 💚	shadedclient	16m 50s		patch has no errors when building and testing our client artifacts.
-1 ❌	javadoc	0m 45s	/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt	hadoop-hdfs-client in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌	javadoc	0m 28s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt	hadoop-azure in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌	javadoc	0m 28s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt	hadoop-azure in the patch failed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.
-1 ❌	findbugs	0m 28s	/patch-findbugs-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
			_ Other Tests _
-1 ❌	unit	10m 24s	/patch-unit-hadoop-common-project_hadoop-common.txt	hadoop-common in the patch passed.
+1 💚	unit	2m 24s		hadoop-hdfs-client in the patch passed.
-1 ❌	unit	112m 10s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
-1 ❌	unit	0m 33s	/patch-unit-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
+1 💚	unit	1m 8s		hadoop-azure-datalake in the patch passed.
+1 💚	asflicense	0m 53s		The patch does not generate ASF License warnings.
		403m 46s

Reason	Tests
Failed junit tests	hadoop.fs.contract.rawlocal.TestRawlocalContractCreate
	hadoop.hdfs.TestFileChecksum
	hadoop.hdfs.TestRollingUpgrade
	hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks
	hadoop.hdfs.TestFileChecksumCompositeCrc
	hadoop.hdfs.server.namenode.TestNamenodeCapacityReport

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/5/artifact/out/Dockerfile
GITHUB PR	#2102
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname	Linux a8df3b1adc15 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `d60d5fe`
Default Java	Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/5/testReport/
Max. process+thread count	2802 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-azure hadoop-tools/hadoop-azure-datalake U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/5/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.1.3
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2020-11-18T17:58:07Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	33m 30s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚		0m 0s	test4tests	The patch appears to include 10 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 42s		Maven dependency ordering for branch
-1 ❌	mvninstall	28m 3s	/branch-mvninstall-root.txt	root in trunk failed.
+1 💚	compile	24m 47s		trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚	compile	18m 6s		trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚	checkstyle	2m 50s		trunk passed
+1 💚	mvnsite	5m 7s		trunk passed
+1 💚	shadedclient	25m 37s		branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	3m 48s		trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚	javadoc	4m 42s		trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+0 🆗	spotbugs	0m 44s		Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	9m 55s		trunk passed
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 22s		Maven dependency ordering for patch
-1 ❌	mvninstall	0m 13s	/patch-mvninstall-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
-1 ❌	compile	19m 47s	/patch-compile-root-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt	root in the patch failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.
-1 ❌	javac	19m 47s	/patch-compile-root-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt	root in the patch failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.
-1 ❌	compile	17m 16s	/patch-compile-root-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt	root in the patch failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.
-1 ❌	javac	17m 16s	/patch-compile-root-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt	root in the patch failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.
-0 ⚠️	checkstyle	2m 45s	/buildtool-patch-checkstyle-root.txt	The patch fails to run checkstyle in root
-1 ❌	mvnsite	0m 28s	/patch-mvnsite-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
-1 ❌	whitespace	0m 0s	/whitespace-eol.txt	The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	xml	0m 5s		The patch has no ill-formed XML file.
+1 💚	shadedclient	17m 12s		patch has no errors when building and testing our client artifacts.
-1 ❌	javadoc	0m 55s	/diff-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt	hadoop-common-project_hadoop-common-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 generated 1 new + 98 unchanged - 1 fixed = 99 total (was 99)
-1 ❌	javadoc	0m 28s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt	hadoop-azure in the patch failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.
-1 ❌	javadoc	0m 28s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt	hadoop-azure in the patch failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.
-1 ❌	findbugs	0m 28s	/patch-findbugs-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
			_ Other Tests _
-1 ❌	unit	10m 20s	/patch-unit-hadoop-common-project_hadoop-common.txt	hadoop-common in the patch passed.
+1 💚	unit	2m 24s		hadoop-hdfs-client in the patch passed.
+1 💚	unit	113m 3s		hadoop-hdfs in the patch passed.
-1 ❌	unit	0m 35s	/patch-unit-hadoop-tools_hadoop-azure.txt	hadoop-azure in the patch failed.
+1 💚	unit	1m 4s		hadoop-azure-datalake in the patch passed.
+1 💚	asflicense	0m 53s		The patch does not generate ASF License warnings.
		380m 38s

Reason	Tests
Failed junit tests	hadoop.fs.contract.rawlocal.TestRawlocalContractCreate

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/6/artifact/out/Dockerfile
GITHUB PR	#2102
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname	Linux 00159079ab5f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `425996e`
Default Java	Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/6/testReport/
Max. process+thread count	3187 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-azure hadoop-tools/hadoop-azure-datalake U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/6/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

joshelser · 2020-11-18T18:10:37Z

Yikes. These got mis-filed in my mail. I'm sorry, Steve.

Declare that HSYNC is sufficient and the only one you should look for. If you implement HFLUSH and not HSYNC, you are of no use to applications. retain support for HFLUSH probe, but in the spec only mention it as of historical interest.

My only worry with this is that, today, HBase is only running with HFlush and not HSync because of performance reasons. The last wag I remember was in the ballpark of a 30% performance hit for writes if we force hsync instead of hflush. Let me re-read your latest and see if my mind changes.

joshelser · 2020-11-18T19:09:15Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md

+it MAY be durable. That is: it does not have to be persisted, merely guaranteed
+to be consistently visible to all clients attempting to open a new stream reading
+data at the path.
+1. `Syncable.hsync()` MUST flush the data and persist data to the underlying durable


Nit: clarify that "flush" in this sense refers to "hflush" and not "OutputStream.flush".

You clarify in detail earlier that hsync must do all the hflush also does. Just avoiding any opportunity for a reader to have missed that section and skipped to here.

done. Also added a bit about hflush being able to forward to hsync, but not the other way around

joshelser · 2020-11-18T19:12:06Z

My only thought at this point is: are we creating a dichotomy by having HFLUSH be deprecated in StreamCapabilities but not deprecated in Syncable?

The argument is that we would never have an implementor of hflush which doesn't also implement hsync. However, every implementor of hsync should have an implementation for hflush (even if that implementation is to just invoke hsync).

Now that I've come to this, I think outputstream.md adequately does this. Do you think it's appropriate to have a reciprocal javadoc reference from Syncable back to StreamCapabilities? I was thinking, if I was writing against this API, I might not know to go looking for StreamCapabilities if I just start digging through the Syncable API. I would defer to your judgement here (if you think it's unnecessary, I am not concerned).

This was a fantastic and thorough read. Thanks for taking the time to do it, Steve!

steveloughran · 2020-11-18T19:23:32Z

yeah, lets cross ref the javadocs. "if you implement this you SHOULD declare that you support the capability"

mehakmeet

LGTM. just some nits in the doc and a small issue in imports.

mehakmeet · 2020-11-19T09:12:59Z

...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java

@@ -32,7 +32,7 @@
 import java.util.concurrent.Callable;
 import java.util.concurrent.Future;
 import java.util.concurrent.TimeUnit;
-
+This


"This" should be removed. This might be the reason why Yetus is failing.

heh, that's what comes of having speech recognition turned on

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md

mehakmeet · 2020-11-19T09:51:08Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md

+writing small 4KB blocks or similar.
+
+Forwarding this to a full flush across a distributed filesystem, or worse,
+a distant object store, is very underperformant


nit: typo in "underperforming"

It is actually a word, AFAIK. But clearly too pretentious. Will fix

mehakmeet · 2020-11-19T09:54:12Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md

+
+1. The check for existing data in a `create()` call with `overwrite=False`, may
+take place in the `create()` call itself, in the `close()` call prior to/during
+the write, or at some point in between. Expect in the special case that the


nit: typo in "Except"

joshelser

Looks great to me (barring This ;) )

steveloughran · 2020-11-24T14:27:53Z

wow, that was a big failure. lets rebase and see what can "go away"., then look @ failures in detail

This PR removes the changes related to S3A output stream lifecycle, so only covers the specification of Syncable and ensures that StreamCapabilities passes all the way through to the final implementation classes. All streams which implement Syncable hsync/hflush declare this in their stream capabilities Change-Id: I82b16a8e0965f34eb0c42504da43e8fbeabcb68c

Change-Id: Id38cf27639215abdd0d8c3578ddf72ed7adca8c5

TODO "Could this be in a section about visibility and not in the model definition? Maybe later. "here's the model, here's how that model works with creation, here's how it works when reading/writing" flows much better and visibility should be in that third part." Change-Id: I61c89475a1ea72006524803f2a7dd9e40551d718

Review with more on 404 caching. Change-Id: Ib474a84e48556c6b76121427a026fa854b5bd9e0

Doesn't actually get through as there's a BufferedOutputStream in the way... will need to do something there :) Change-Id: Ib2e4517e266168f3ad106e55ceac987ed443aeec

Change-Id: I95e6b63eda051afae22a9d48c35b5e69e0464852

* use (renamed) predicate isProbeForSyncable() everywhere appropriate * HFLUSH marked as deprecated * outputstream.md lists standard capabilities, declares HFLUSH as deprecated. * clean up tests probing for the capabilities Change-Id: I2a8c2ddc7119c31a73ee327a181bf700a2c5f21f

Change-Id: I00df6431167e04f2210d76d5712d8c16e70f1c70

steveloughran · 2021-01-04T14:26:27Z

Closing as #2587 is a rebased successor, one where the use of the new BufferedIOStatisticsOutputStream wrapper passes syncable through. With that change raw local FS does now support Syncable correctly

steveloughran mentioned this pull request Jun 26, 2020

HADOOP-13327 Output Stream Specification. #1694

Closed

steveloughran requested review from busbey and DadanielZ June 26, 2020 16:38

joshelser reviewed Jun 29, 2020

View reviewed changes

apache deleted a comment from hadoop-yetus Jun 30, 2020

steveloughran mentioned this pull request Aug 6, 2020

HADOOP-16830. IOStatistics API. #2069

Closed

apache deleted a comment from hadoop-yetus Sep 21, 2020

steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 28bc9f4 to 9f4fb5c Compare September 21, 2020 16:38

steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 99ad6e0 to e818e68 Compare September 21, 2020 16:49

apache deleted a comment from hadoop-yetus Sep 22, 2020

apache deleted a comment from hadoop-yetus Oct 1, 2020

steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from e818e68 to 3757462 Compare October 19, 2020 14:02

steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 3757462 to 528739e Compare November 18, 2020 11:36

joshelser reviewed Nov 18, 2020

View reviewed changes

mehakmeet reviewed Nov 19, 2020

View reviewed changes

steveloughran added enhancement fs fs/azure changes related to azure; submitter must declare test endpoint labels Nov 19, 2020

steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Nov 19, 2020

joshelser approved these changes Nov 19, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

steveloughran added 8 commits November 24, 2020 14:28

HADOOP-13327 review usages of "must" in lower case

7452ea1

Change-Id: Id38cf27639215abdd0d8c3578ddf72ed7adca8c5

HADOOP-13327 output stream spec.

af42347

Review with more on 404 caching. Change-Id: Ib474a84e48556c6b76121427a026fa854b5bd9e0

HADOOP-13327 add RawLocalFileystem Syncable

0271da2

Doesn't actually get through as there's a BufferedOutputStream in the way... will need to do something there :) Change-Id: Ib2e4517e266168f3ad106e55ceac987ed443aeec

HADOOP-13327 Josh's comments

16a8b25

Change-Id: I95e6b63eda051afae22a9d48c35b5e69e0464852

HADOOP-13327 review feedback and a bit of rereading myself

e7a1212

Change-Id: I00df6431167e04f2210d76d5712d8c16e70f1c70

steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 8c139f4 to e7a1212 Compare November 24, 2020 14:29

This comment has been minimized.

Sign in to view

steveloughran mentioned this pull request Jan 4, 2021

HADOOP-13327 Output Stream Specification. #2587

Merged

steveloughran closed this Jan 4, 2021

HADOOP-13327. Specify Output Stream and Syncable #2102

HADOOP-13327. Specify Output Stream and Syncable #2102

Uh oh!

Conversation

steveloughran commented Jun 26, 2020

Uh oh!

steveloughran commented Jun 26, 2020

Uh oh!

steveloughran commented Jun 29, 2020

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Sep 21, 2020

Uh oh!

steveloughran commented Sep 22, 2020

Uh oh!

steveloughran commented Sep 24, 2020

Uh oh!

steveloughran commented Oct 19, 2020

Uh oh!

hadoop-yetus commented Oct 19, 2020

Uh oh!

hadoop-yetus commented Nov 18, 2020

Uh oh!

joshelser commented Nov 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshelser commented Nov 18, 2020

Uh oh!

steveloughran commented Nov 18, 2020

Uh oh!

mehakmeet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshelser left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

steveloughran commented Nov 24, 2020

Uh oh!

This comment has been minimized.

steveloughran commented Jan 4, 2021

Uh oh!

Uh oh!