Skip to content

HADOOP-13327. Specify Output Stream and Syncable #2102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steveloughran
Copy link
Contributor

successor to #1694

  • adds a spec doc for what we expect from an output stream
  • and especially Syncable hsync()/hflush(), which are critical for HBase and other apps with WALs to write
  • and adds checks for StreamCapabilities...those filesystems which implement the API now all declare that fact.

@steveloughran
Copy link
Contributor Author

hi @busbey @DadanielZ -this is the rebased version of #1694. Could you two take a look at it and see if we can get it in. Not just because it says what we expect from hsync/hflush, but because we add the ability to probe a stream for having it before actually attempting the call. All output streams in our codebase which implement Syncable do the right thing here

@steveloughran
Copy link
Contributor Author

rawlocal FS failure is because even though localfs.xml says hsync is supported, it's not being passed all the way down. I need a subclass of java.io.ByteBufferOutputStream to pass it through.

The good news, that comes in #2069; I'm going to change the xml test declarations until that is in

This MAY be a bug, as it allows >1 client to create a file with `overwrite==false`,
and potentially confuse file/directory logic
This is a significant difference between the behavior of object stores
and that of filesystems, as it allows >1 client to create a file with `overwrite==false`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You used overwrite=false below. I'm assuming the double-equals was just copy-paste, but maybe correct to stay consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to ==; also replace >1 with > to keep everything happy

model permits the output stream to have `close()` invoked while awaiting an
acknowledgement from datanode or namenode writes in an `hsync()` operation.

### <a name="consistencyy"></a> Consistency and Visibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"consistency"

@apache apache deleted a comment from hadoop-yetus Jun 30, 2020
@apache apache deleted a comment from hadoop-yetus Sep 21, 2020
@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 28bc9f4 to 9f4fb5c Compare September 21, 2020 16:38
@steveloughran
Copy link
Contributor Author

@joshelser -rebased and addressed your minor points

I think you are right, we don't need separate HSYNC/HFLUSH options, it's just that they've been there (separately) for a while.

What to do?

  • maintain as is
  • add SYNCABLE which says "both"
  • Declare that HSYNC is sufficient and the only one you should look for. If you implement HFLUSH and not HSYNC, you are of no use to applications. retain support for HFLUSH probe, but in the spec only mention it as of historical interest.

I like option 3, now I think about it.

@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 99ad6e0 to e818e68 Compare September 21, 2020 16:49
@apache apache deleted a comment from hadoop-yetus Sep 22, 2020
@apache apache deleted a comment from hadoop-yetus Sep 22, 2020
@steveloughran
Copy link
Contributor Author

TestRawLocalContract create is failing until we do our own BufferedOutputStream in passthrough of HADOOP-16830/ #2323 .

the HDFS checksum ones are about checksums not being found, which implies they weren't being written. Which goes near output streams, doesn't it? Which means there is a risk this is a genuine regression and not "hdfs tests being flaky"

org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail to get block checksum for LocatedStripedBlock{BP-1893408133-172.17.0.3-1600724641689:blk_-9223372036854775792_1001; getBlockSize()=37748736; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:33131,DS-5c8811e5-39af-4967-b3a7-3b76f95b0317,DISK], DatanodeInfoWithStorage[127.0.0.1:38603,DS-82754ee5-286a-402b-9861-b3ebbc149849,DISK], DatanodeInfoWithStorage[127.0.0.1:44379,DS-080d6ce9-cb6a-4f80-b37a-e63ecc31d9bc,DISK], DatanodeInfoWithStorage[127.0.0.1:34723,DS-c6c2aa4e-639b-4d85-9564-05631c8c5b79,DISK], DatanodeInfoWithStorage[127.0.0.1:40061,DS-54eb3c16-b9ce-4a5d-a7a3-f33b635579b0,DISK], DatanodeInfoWithStorage[127.0.0.1:45525,DS-3c9ac850-9273-4d2d-933c-2ac3b4b30308,DISK], DatanodeInfoWithStorage[127.0.0.1:45537,DS-9b1d805c-3b0e-4c84-ad0e-f454817b6829,DISK], DatanodeInfoWithStorage[127.0.0.1:44697,DS-e06897c8-c8a5-41d6-b5bc-57c7339bbf9b,DISK]]; indices=[1, 2, 3, 4, 5, 6, 7, 8]}
	at org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
	at org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
	at org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
	at org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
	at org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1891)
	at org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1888)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1905)
	at org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
	at org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
	at org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery1(TestFileChecksum.java:312)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran
Copy link
Contributor Author

Thinking:

  • single capability, syncable, everything maps to
  • skip local FS test until we get the IOStats patch in, then re-enable

@apache apache deleted a comment from hadoop-yetus Oct 1, 2020
@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from e818e68 to 3757462 Compare October 19, 2020 14:02
@steveloughran
Copy link
Contributor Author

HSYNC is the only option needed; tag HFLUSH as deprecated and in the docs say its not useful; outputstream.md doc lists the standard options.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 37m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 10 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 25m 39s Maven dependency ordering for branch
+1 💚 mvninstall 42m 8s trunk passed
+1 💚 compile 21m 33s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 17m 58s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 2m 49s trunk passed
+1 💚 mvnsite 5m 7s trunk passed
+1 💚 shadedclient 24m 45s branch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 0m 47s /branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt hadoop-hdfs-client in trunk failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
+1 💚 javadoc 4m 45s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 0m 44s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 9m 53s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 14s /patch-mvninstall-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ compile 19m 56s /patch-compile-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt root in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌ javac 19m 56s /patch-compile-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt root in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌ compile 17m 21s /patch-compile-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt root in the patch failed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.
-1 ❌ javac 17m 21s /patch-compile-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt root in the patch failed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.
-0 ⚠️ checkstyle 2m 45s /buildtool-patch-checkstyle-root.txt The patch fails to run checkstyle in root
-1 ❌ mvnsite 0m 28s /patch-mvnsite-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ whitespace 0m 0s /whitespace-eol.txt The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 5s The patch has no ill-formed XML file.
+1 💚 shadedclient 16m 50s patch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 0m 45s /patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt hadoop-hdfs-client in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.
-1 ❌ findbugs 0m 28s /patch-findbugs-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
_ Other Tests _
-1 ❌ unit 10m 24s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 unit 2m 24s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 112m 10s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
-1 ❌ unit 0m 33s /patch-unit-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
+1 💚 unit 1m 8s hadoop-azure-datalake in the patch passed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
403m 46s
Reason Tests
Failed junit tests hadoop.fs.contract.rawlocal.TestRawlocalContractCreate
hadoop.hdfs.TestFileChecksum
hadoop.hdfs.TestRollingUpgrade
hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks
hadoop.hdfs.TestFileChecksumCompositeCrc
hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/5/artifact/out/Dockerfile
GITHUB PR #2102
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux a8df3b1adc15 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d60d5fe
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/5/testReport/
Max. process+thread count 2802 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-azure hadoop-tools/hadoop-azure-datalake U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/5/console
versions git=2.17.1 maven=3.6.0 findbugs=4.1.3
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 3757462 to 528739e Compare November 18, 2020 11:36
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 33m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 10 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 42s Maven dependency ordering for branch
-1 ❌ mvninstall 28m 3s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 24m 47s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 compile 18m 6s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 checkstyle 2m 50s trunk passed
+1 💚 mvnsite 5m 7s trunk passed
+1 💚 shadedclient 25m 37s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 3m 48s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 4m 42s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+0 🆗 spotbugs 0m 44s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 9m 55s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 22s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 13s /patch-mvninstall-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ compile 19m 47s /patch-compile-root-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt root in the patch failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.
-1 ❌ javac 19m 47s /patch-compile-root-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt root in the patch failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.
-1 ❌ compile 17m 16s /patch-compile-root-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt root in the patch failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.
-1 ❌ javac 17m 16s /patch-compile-root-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt root in the patch failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.
-0 ⚠️ checkstyle 2m 45s /buildtool-patch-checkstyle-root.txt The patch fails to run checkstyle in root
-1 ❌ mvnsite 0m 28s /patch-mvnsite-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ whitespace 0m 0s /whitespace-eol.txt The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 5s The patch has no ill-formed XML file.
+1 💚 shadedclient 17m 12s patch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 0m 55s /diff-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 generated 1 new + 98 unchanged - 1 fixed = 99 total (was 99)
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.
-1 ❌ findbugs 0m 28s /patch-findbugs-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
_ Other Tests _
-1 ❌ unit 10m 20s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 unit 2m 24s hadoop-hdfs-client in the patch passed.
+1 💚 unit 113m 3s hadoop-hdfs in the patch passed.
-1 ❌ unit 0m 35s /patch-unit-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
+1 💚 unit 1m 4s hadoop-azure-datalake in the patch passed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
380m 38s
Reason Tests
Failed junit tests hadoop.fs.contract.rawlocal.TestRawlocalContractCreate
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/6/artifact/out/Dockerfile
GITHUB PR #2102
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux 00159079ab5f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 425996e
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/6/testReport/
Max. process+thread count 3187 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-azure hadoop-tools/hadoop-azure-datalake U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2102/6/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@joshelser
Copy link
Member

Yikes. These got mis-filed in my mail. I'm sorry, Steve.

Declare that HSYNC is sufficient and the only one you should look for. If you implement HFLUSH and not HSYNC, you are of no use to applications. retain support for HFLUSH probe, but in the spec only mention it as of historical interest.

My only worry with this is that, today, HBase is only running with HFlush and not HSync because of performance reasons. The last wag I remember was in the ballpark of a 30% performance hit for writes if we force hsync instead of hflush. Let me re-read your latest and see if my mind changes.

it MAY be durable. That is: it does not have to be persisted, merely guaranteed
to be consistently visible to all clients attempting to open a new stream reading
data at the path.
1. `Syncable.hsync()` MUST flush the data and persist data to the underlying durable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: clarify that "flush" in this sense refers to "hflush" and not "OutputStream.flush".

You clarify in detail earlier that hsync must do all the hflush also does. Just avoiding any opportunity for a reader to have missed that section and skipped to here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Also added a bit about hflush being able to forward to hsync, but not the other way around

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joshelser
Copy link
Member

My only thought at this point is: are we creating a dichotomy by having HFLUSH be deprecated in StreamCapabilities but not deprecated in Syncable?

The argument is that we would never have an implementor of hflush which doesn't also implement hsync. However, every implementor of hsync should have an implementation for hflush (even if that implementation is to just invoke hsync).

Now that I've come to this, I think outputstream.md adequately does this. Do you think it's appropriate to have a reciprocal javadoc reference from Syncable back to StreamCapabilities? I was thinking, if I was writing against this API, I might not know to go looking for StreamCapabilities if I just start digging through the Syncable API. I would defer to your judgement here (if you think it's unnecessary, I am not concerned).

This was a fantastic and thorough read. Thanks for taking the time to do it, Steve!

@steveloughran
Copy link
Contributor Author

yeah, lets cross ref the javadocs. "if you implement this you SHOULD declare that you support the capability"

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. just some nits in the doc and a small issue in imports.

@@ -32,7 +32,7 @@
import java.util.concurrent.Callable;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This" should be removed. This might be the reason why Yetus is failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, that's what comes of having speech recognition turned on

writing small 4KB blocks or similar.

Forwarding this to a full flush across a distributed filesystem, or worse,
a distant object store, is very underperformant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo in "underperforming"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually a word, AFAIK. But clearly too pretentious. Will fix


1. The check for existing data in a `create()` call with `overwrite=False`, may
take place in the `create()` call itself, in the `close()` call prior to/during
the write, or at some point in between. Expect in the special case that the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo in "Except"

@steveloughran steveloughran added enhancement fs fs/azure changes related to azure; submitter must declare test endpoint labels Nov 19, 2020
@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Nov 19, 2020
Copy link
Member

@joshelser joshelser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me (barring This ;) )

@hadoop-yetus

This comment has been minimized.

@steveloughran
Copy link
Contributor Author

wow, that was a big failure. lets rebase and see what can "go away"., then look @ failures in detail

This PR removes the changes related to S3A output stream lifecycle,
so only covers the specification of Syncable and ensures that StreamCapabilities
passes all the way through to the final implementation classes.

All streams which implement Syncable hsync/hflush declare this in their stream capabilities

Change-Id: I82b16a8e0965f34eb0c42504da43e8fbeabcb68c
Change-Id: Id38cf27639215abdd0d8c3578ddf72ed7adca8c5
TODO "Could this be in a section about visibility and not in the model definition? Maybe later. "here's the model, here's how that model works with creation, here's how it works when reading/writing" flows much better and visibility should be in that third part."

Change-Id: I61c89475a1ea72006524803f2a7dd9e40551d718
Review with more on 404 caching.

Change-Id: Ib474a84e48556c6b76121427a026fa854b5bd9e0
Doesn't actually get through as there's a BufferedOutputStream
in the way... will need to do something there :)

Change-Id: Ib2e4517e266168f3ad106e55ceac987ed443aeec
Change-Id: I95e6b63eda051afae22a9d48c35b5e69e0464852
* use (renamed) predicate isProbeForSyncable() everywhere appropriate
* HFLUSH marked as deprecated
* outputstream.md lists standard capabilities, declares HFLUSH as deprecated.
* clean up tests probing for the capabilities

Change-Id: I2a8c2ddc7119c31a73ee327a181bf700a2c5f21f
Change-Id: I00df6431167e04f2210d76d5712d8c16e70f1c70
@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec-lean branch from 8c139f4 to e7a1212 Compare November 24, 2020 14:29
@hadoop-yetus

This comment has been minimized.

@steveloughran
Copy link
Contributor Author

Closing as #2587 is a rebased successor, one where the use of the new BufferedIOStatisticsOutputStream wrapper passes syncable through. With that change raw local FS does now support Syncable correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fs/azure changes related to azure; submitter must declare test endpoint fs/s3 changes related to hadoop-aws; submitter must declare test endpoint fs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants