Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-2026. Overlapping chunk region cannot be read concurrently #1349

Closed
wants to merge 1 commit into from

Conversation

adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Only allow a single read/write operation for the same path in ChunkUtils, to avoid OverlappingFileLockException due to concurrent reads. This allows concurrent reads/writes of separate files (as opposed to simply synchronizing the methods). It might be improved later by storing and reusing the file lock.

Use plain FileChannel instead of AsynchronousFileChannel for reading, too, since it was used in synchronous fashion (by calling .get()) anyway.

https://issues.apache.org/jira/browse/HDDS-2026

How was this patch tested?

Added unit test.

Used improved Freon tool from #1341 to perform read of same key from multiple threads (which revealed the bug in the first place).

$ ozone freon ockg -n 1 -p asdf
$ ozone sh key list vol1/bucket1
[ {
  "version" : 0,
  "size" : 10240,
  "keyName" : "asdf/0"
  ...
} ]

$ ozone freon ocokr -k 'asdf/0'
...
         mean rate = 164.39 calls/second
                  ...
              mean = 53.75 milliseconds
            stddev = 42.11 milliseconds
            median = 44.05 milliseconds
...
Total execution time (sec): 6
Failures: 0
Successful executions: 1000

@adoroszlai
Copy link
Contributor Author

/label ozone

@elek elek added the ozone label Aug 26, 2019
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 48 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1160 trunk passed
+1 compile 441 trunk passed
+1 checkstyle 88 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 1183 branch has no errors when building and testing our client artifacts.
+1 javadoc 205 trunk passed
0 spotbugs 528 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 774 trunk passed
_ Patch Compile Tests _
+1 mvninstall 673 the patch passed
+1 compile 454 the patch passed
+1 javac 454 the patch passed
+1 checkstyle 86 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 785 patch has no errors when building and testing our client artifacts.
+1 javadoc 185 the patch passed
+1 findbugs 738 the patch passed
_ Other Tests _
+1 unit 327 hadoop-hdds in the patch passed.
-1 unit 1785 hadoop-ozone in the patch failed.
+1 asflicense 56 The patch does not generate ASF License warnings.
9172
Reason Tests
Failed junit tests hadoop.ozone.container.server.TestSecureContainerServer
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/1/artifact/out/Dockerfile
GITHUB PR #1349
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux cbe1e1f743ac 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 23e532d
Default Java 1.8.0_222
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/1/testReport/
Max. process+thread count 5398 (vs. ulimit of 5500)
modules C: hadoop-hdds/container-service U: hadoop-hdds/container-service
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@adoroszlai
Copy link
Contributor Author

/retest

@adoroszlai
Copy link
Contributor Author

@anuengineer @bshashikant @elek please review

@anuengineer
Copy link
Contributor

LGTM, I will test it to make sure it works as expected. I will also wait for others to comment and then commit.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 77 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 641 trunk passed
+1 compile 378 trunk passed
+1 checkstyle 72 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 934 branch has no errors when building and testing our client artifacts.
+1 javadoc 182 trunk passed
0 spotbugs 477 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 711 trunk passed
_ Patch Compile Tests _
+1 mvninstall 602 the patch passed
+1 compile 407 the patch passed
+1 javac 407 the patch passed
+1 checkstyle 74 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 730 patch has no errors when building and testing our client artifacts.
+1 javadoc 199 the patch passed
+1 findbugs 755 the patch passed
_ Other Tests _
+1 unit 355 hadoop-hdds in the patch passed.
-1 unit 2792 hadoop-ozone in the patch failed.
+1 asflicense 52 The patch does not generate ASF License warnings.
9119
Reason Tests
Failed junit tests hadoop.ozone.container.server.TestSecureContainerServer
hadoop.ozone.client.rpc.TestWatchForCommit
hadoop.ozone.client.rpc.Test2WayCommitInRatis
hadoop.ozone.client.rpc.TestOzoneRpcClientForAclAuditLog
Subsystem Report/Notes
Docker Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/2/artifact/out/Dockerfile
GITHUB PR #1349
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 9d517b259197 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 4379370
Default Java 1.8.0_222
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/2/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/2/testReport/
Max. process+thread count 5234 (vs. ulimit of 5500)
modules C: hadoop-hdds/container-service U: hadoop-hdds/container-service
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1349/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@bshashikant
Copy link
Contributor

bshashikant commented Aug 27, 2019

Thanks @adoroszlai for working on this. The locking logic seems good to me.
But in Ozone world, a chunk once written is immutable , and hence we might not need the lock at all
while reading a chunk.

@anuengineer @elek @mukul1987 what do you think?

There is also a race condition which exists in the system where a chunk might get deleted by BlockDeleting service in Datanode while readChunk is happening which is where i think we need to have synchronization on a container level (not precisely at the chunk level) for closed containers but this problem is beyond the scope of this jira.

@anuengineer
Copy link
Contributor

bq. But in Ozone world, a chunk once written is immutable , and hence we might not need the lock at all while reading a chunk.
In the cblock work, we added a flag that can be used to overwrite the block. so this current patch is safer.

@anuengineer
Copy link
Contributor

@adoroszlai I have committed this to the trunk. Thanks for the contribution. @bshashikant Thanks for the comments.

@adoroszlai adoroszlai deleted the HDDS-2026 branch August 28, 2019 04:07
@adoroszlai
Copy link
Contributor Author

Thanks @anuengineer and @bshashikant for reviews, and @anuengineer for committing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants