Skip to content

HBASE-26323 introduce a SnapshotProcedure #3716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

frostruan
Copy link
Contributor

No description provided.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 prototool 0m 0s prototool was not available.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 57s master passed
+1 💚 compile 5m 34s master passed
+1 💚 checkstyle 1m 50s master passed
+1 💚 spotbugs 7m 3s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 39s the patch passed
+1 💚 compile 5m 35s the patch passed
+1 💚 cc 5m 35s the patch passed
+1 💚 javac 5m 35s the patch passed
-0 ⚠️ checkstyle 0m 31s hbase-client: The patch generated 1 new + 186 unchanged - 0 fixed = 187 total (was 186)
-0 ⚠️ checkstyle 1m 7s hbase-server: The patch generated 23 new + 157 unchanged - 1 fixed = 180 total (was 158)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 18m 32s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 hbaseprotoc 2m 4s the patch passed
-1 ❌ spotbugs 2m 20s hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
68m 3s
Reason Tests
FindBugs module:hbase-server
Inconsistent synchronization of org.apache.hadoop.hbase.master.procedure.ServerRemoteProcedure.dispatched; locked 60% of time Unsynchronized access at SnapshotVerifyProcedure.java:60% of time Unsynchronized access at SnapshotVerifyProcedure.java:[line 97]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3716
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile cc hbaseprotoc prototool
uname Linux 4321f64144be 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f65b769
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-client.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotbugs https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 3m 55s master passed
+1 💚 compile 2m 15s master passed
+1 💚 shadedjars 8m 19s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 15s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 3m 45s the patch passed
+1 💚 compile 2m 13s the patch passed
+1 💚 javac 2m 13s the patch passed
+1 💚 shadedjars 8m 13s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 13s the patch passed
_ Other Tests _
+1 💚 unit 0m 47s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 18s hbase-client in the patch passed.
-1 ❌ unit 149m 34s hbase-server in the patch failed.
186m 38s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3716
Optional Tests javac javadoc unit shadedjars compile
uname Linux 295faa7231f2 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f65b769
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/testReport/
Max. process+thread count 4712 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 58s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for branch
+1 💚 mvninstall 5m 2s master passed
+1 💚 compile 2m 52s master passed
+1 💚 shadedjars 9m 13s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 21s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 4m 51s the patch passed
+1 💚 compile 2m 53s the patch passed
+1 💚 javac 2m 53s the patch passed
+1 💚 shadedjars 9m 11s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 19s the patch passed
_ Other Tests _
+1 💚 unit 1m 3s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 43s hbase-client in the patch passed.
-1 ❌ unit 206m 44s hbase-server in the patch failed.
250m 0s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3716
Optional Tests javac javadoc unit shadedjars compile
uname Linux 41f7283ebf47 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f65b769
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/testReport/
Max. process+thread count 2919 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start.

Overall the approach is good. I think you should have learned a lot on on proc-v2 and am-v2 @frostruan . It is not an easy work. Really appreciate on your effect.

Left some comments, let's work together to get this done. There are some high level questions:

  1. How to prevent split/merge at the same time when the SnapshotProcedure only takes shared lock on the table?
  2. Why do we let client specify whether to use zk coordination?
  3. The RemoteProcedureRequest is not enough for snapshot request? It is designed to be general usage.

@@ -277,6 +284,7 @@ message ExecuteProceduresRequest {
repeated OpenRegionRequest open_region = 1;
repeated CloseRegionRequest close_region = 2;
repeated RemoteProcedureRequest proc = 3;
repeated SnapshotRegionRequest snapshot_region = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can not be implemented as a RemoteProcedure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Do you think if we can introduce a snapshotTable method in region server for master to snapshot online regions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the current implementation for some remote procedures. They just use the same executeProcedures method for sending from master to region server. The RemoteProcedureRequest has a class name field for creating it using reflection. You can implement the logic of snapshot in the created class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply. @Apache9

I think we have three options to implement snapshot logic here.
option a. using a RegionRemoteProcedure to implement region related operations, like open, close, snapshot and flush. That's my choice in the first commit.
option b. using a ServerRemoteProcedure to snapshot regions (or just a single region) in a remote server, like log splitting or claim replication queue. That's my choice for snapshot verifying. I think this is what you recommend.
option c. we introduce a new method snapshotRegion in Admin.proto for RegionServerAdmin, and give it admin priority just like flush region and log roll. for master, we just throw unsupported exception.

Sorry I misunderstood what you say "It can not be implemented as a RemoteProcedure?". I thought you mean "we shouldn't using remote procedure to implement this". so I changed from option a to option c.

Of course we can implement snapshot with ServerRemoteProcedure. Actually I think it makes easier to snapshot regions because we don't need to modify RSProcedureDispatcher to introduce a new kind of region operation. I will try to fix this as soon as possible. Thank you.

// but we may need to downgrade it to shared lock for some reasons:
// a. exclusive lock has a negative effect on assigning region. See HBASE-21480 for details.
// b. we want to support taking multiple different snapshots on same table on the same time.
if (env.getProcedureScheduler().waitTableSharedLock(this, getTableName())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough? The SplitTableRegionProcedure and MergeTableRegionsProcedure also take shared lock on table, so taking shared lock can not prevent split/merge at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing @Apache9 . Yes, shared table lock is not enough, so there is some extra work in SnapshotManager. Before SplitTableRegionProcedure/MergeTableRegionsProcedure runs, they will check if the table is in snapshot. If table is in snapshot, the procedure will stop. In SnapshotManager, we have a map whose key is SnapshotDescription and value is SnapshotProcedure id. The map will be rebuild when master restarts. Would you mind taking a look on those?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if a merge or a split is already on going when we want to start snapshot here?

throw new UnsupportedOperationException("unhandled state=" + state);
}
} catch (Exception e) {
if (e instanceof CorruptedSnapshotException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add this catch to the specific state? We will jump to the beginning of the procedure, I do not think we can do this in later states?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a new specific state would be better. I will try to fix it. Thanks for your advise @Apache9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When snapshot is corrupted, I just mark the state of procedure as FAILED and rollback the procedure. Do you think if we can make the procedure retryable like SplitWALProcedure or SyncReplicationReplayWALProcedure? @Apache9

new MasterSnapshotVerifier(env.getMasterServices(), snapshot, workingDirFS);

if (numRegions >= verifyThreshold) {
verifier.verifySnapshot(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to create the verifier for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, code here is a little confusing. let me try to explain this.

There are four parts in snapshot verifying.
a. verify snapshot descriptor.
b. verify table descriptor.
c. verify region count and region info.
d. verify store fils.

For large table snapshot, most time will be spent on verifying region info and store files. The SnapshotVerifyProcedure was designed to only verify region info and store files, so here I create a master verifier to verify snapshot descriptor, table descriptor and region count.

Sure, we can let SnapshotVerifyProcedure do this work also. I will fix it. Thanks for your advise @Apache9 .

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 prototool 0m 1s prototool was not available.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 4m 23s master passed
+1 💚 compile 7m 9s master passed
+1 💚 checkstyle 2m 47s master passed
+1 💚 spotbugs 8m 52s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 4m 5s the patch passed
+1 💚 compile 7m 14s the patch passed
+1 💚 cc 7m 14s the patch passed
+1 💚 javac 7m 14s the patch passed
-0 ⚠️ checkstyle 1m 12s hbase-server: The patch generated 8 new + 160 unchanged - 1 fixed = 168 total (was 161)
-0 ⚠️ rubocop 0m 13s The patch generated 10 new + 394 unchanged - 0 fixed = 404 total (was 394)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 32s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 hbaseprotoc 2m 59s the patch passed
-1 ❌ spotbugs 2m 28s hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
83m 30s
Reason Tests
FindBugs module:hbase-server
Inconsistent synchronization of org.apache.hadoop.hbase.master.procedure.ServerRemoteProcedure.dispatched; locked 60% of time Unsynchronized access at SnapshotVerifyProcedure.java:60% of time Unsynchronized access at SnapshotVerifyProcedure.java:[line 103]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3716
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile cc hbaseprotoc prototool rubocop
uname Linux b7fb50e599fa 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / b94188f
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
rubocop https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-general-check/output/diff-patch-rubocop.txt
spotbugs https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2 rubocop=0.80.0
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for branch
+1 💚 mvninstall 3m 36s master passed
+1 💚 compile 3m 9s master passed
+1 💚 shadedjars 8m 22s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 1s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 3m 39s the patch passed
+1 💚 compile 3m 8s the patch passed
+1 💚 javac 3m 8s the patch passed
+1 💚 shadedjars 8m 14s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 1s the patch passed
_ Other Tests _
+1 💚 unit 0m 46s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 19s hbase-client in the patch passed.
-1 ❌ unit 149m 39s hbase-server in the patch failed.
+1 💚 unit 6m 39s hbase-thrift in the patch passed.
+1 💚 unit 7m 20s hbase-shell in the patch passed.
204m 15s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3716
Optional Tests javac javadoc unit shadedjars compile
uname Linux f3e469a50f0d 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / b94188f
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/testReport/
Max. process+thread count 5055 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 17s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for branch
+1 💚 mvninstall 5m 21s master passed
+1 💚 compile 4m 16s master passed
+1 💚 shadedjars 9m 11s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 34s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 5m 1s the patch passed
+1 💚 compile 4m 14s the patch passed
+1 💚 javac 4m 14s the patch passed
+1 💚 shadedjars 9m 12s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 34s the patch passed
_ Other Tests _
+1 💚 unit 1m 3s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 41s hbase-client in the patch passed.
-1 ❌ unit 205m 37s hbase-server in the patch failed.
+1 💚 unit 8m 27s hbase-thrift in the patch passed.
+1 💚 unit 7m 2s hbase-shell in the patch passed.
270m 57s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3716
Optional Tests javac javadoc unit shadedjars compile
uname Linux b6b1bf107d68 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / b94188f
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/testReport/
Max. process+thread count 3538 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 prototool 0m 0s prototool was not available.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for branch
+1 💚 mvninstall 4m 21s master passed
+1 💚 compile 7m 12s master passed
+1 💚 checkstyle 2m 47s master passed
+1 💚 spotbugs 8m 57s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 4m 15s the patch passed
+1 💚 compile 7m 19s the patch passed
+1 💚 cc 7m 19s the patch passed
+1 💚 javac 7m 19s the patch passed
+1 💚 checkstyle 0m 9s The patch passed checkstyle in hbase-protocol-shaded
+1 💚 checkstyle 0m 32s The patch passed checkstyle in hbase-client
+1 💚 checkstyle 1m 12s hbase-server: The patch generated 0 new + 123 unchanged - 2 fixed = 123 total (was 125)
+1 💚 checkstyle 0m 47s The patch passed checkstyle in hbase-thrift
+1 💚 checkstyle 0m 10s The patch passed checkstyle in hbase-shell
-0 ⚠️ rubocop 0m 14s The patch generated 10 new + 394 unchanged - 0 fixed = 404 total (was 394)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 21m 30s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 hbaseprotoc 3m 0s the patch passed
-1 ❌ spotbugs 2m 25s hbase-server generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
_ Other Tests _
+1 💚 asflicense 0m 54s The patch does not generate ASF License warnings.
84m 55s
Reason Tests
FindBugs module:hbase-server
Inconsistent synchronization of org.apache.hadoop.hbase.master.procedure.ServerRemoteProcedure.dispatched; locked 60% of time Unsynchronized access at SnapshotVerifyProcedure.java:60% of time Unsynchronized access at SnapshotVerifyProcedure.java:[line 104]
Useless object stored in variable serverNames of method org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process() At TakeSnapshotHandler.java:serverNames of method org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process() At TakeSnapshotHandler.java:[line 205]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3716
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile cc hbaseprotoc prototool rubocop
uname Linux 9c3d4fa70583 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ff11f11
Default Java AdoptOpenJDK-1.8.0_282-b08
rubocop https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-general-check/output/diff-patch-rubocop.txt
spotbugs https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2 rubocop=0.80.0
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 4m 5s master passed
+1 💚 compile 3m 15s master passed
+1 💚 shadedjars 8m 20s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 5s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 3m 52s the patch passed
+1 💚 compile 3m 12s the patch passed
+1 💚 javac 3m 12s the patch passed
+1 💚 shadedjars 8m 18s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 39s hbase-server generated 2 new + 21 unchanged - 0 fixed = 23 total (was 21)
_ Other Tests _
+1 💚 unit 0m 46s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 21s hbase-client in the patch passed.
-1 ❌ unit 149m 45s hbase-server in the patch failed.
+1 💚 unit 6m 39s hbase-thrift in the patch passed.
+1 💚 unit 7m 17s hbase-shell in the patch passed.
205m 3s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3716
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6f0fee034687 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ff11f11
Default Java AdoptOpenJDK-1.8.0_282-b08
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-jdk8-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/testReport/
Max. process+thread count 3611 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 2s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for branch
+1 💚 mvninstall 5m 7s master passed
+1 💚 compile 3m 57s master passed
+1 💚 shadedjars 9m 10s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 34s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 5m 1s the patch passed
+1 💚 compile 3m 58s the patch passed
+1 💚 javac 3m 58s the patch passed
+1 💚 shadedjars 9m 11s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 43s hbase-server generated 2 new + 86 unchanged - 0 fixed = 88 total (was 86)
_ Other Tests _
+1 💚 unit 1m 5s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 44s hbase-client in the patch passed.
+1 💚 unit 204m 43s hbase-server in the patch passed.
+1 💚 unit 8m 29s hbase-thrift in the patch passed.
+1 💚 unit 7m 1s hbase-shell in the patch passed.
268m 55s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3716
Optional Tests javac javadoc unit shadedjars compile
uname Linux 78d4a7da458f 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ff11f11
Default Java AdoptOpenJDK-11.0.10+9
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/artifact/yetus-jdk11-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/testReport/
Max. process+thread count 2497 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3716/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished reviewing the whole patch. It is so big...

The overall architecture is very good. But maybe there are still points which I haven't fully understand.

Need to discuss more.

// but we may need to downgrade it to shared lock for some reasons:
// a. exclusive lock has a negative effect on assigning region. See HBASE-21480 for details.
// b. we want to support taking multiple different snapshots on same table on the same time.
if (env.getProcedureScheduler().waitTableSharedLock(this, getTableName())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if a merge or a split is already on going when we want to start snapshot here?

status.markComplete("Snapshot " + snapshot.getName() + " completed");
}

private void snapshotOfflineRegions(MasterProcedureEnv env) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is for snapshoting split parent. Better just name it snapshotSplitParentRegions, offline is a bit confusing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really missed this case. Thanks for reminding me. Split/Merge Procedure and Snapshot Procedure, these two procedures should check whether the other is executing before starting execution. Now we only check snapshot procedure before running Split/Merge Procedure. I will try to fix this.

@Override
protected void serializeStateData(ProcedureStateSerializer serializer) throws IOException {
super.serializeStateData(serializer);
serializer.serialize(SnapshotProcedureStateData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not serialize snapshotManifest here? Then how do we restore it when recovery?

@Override
protected void afterReplay(MasterProcedureEnv env) {
try {
prepareSnapshot(env);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the trick is here, we will call parepareSnapshot to restore snapshot manifest.

try {
prepareSnapshot(env);
} catch (IOException e) {
LOG.error("Failed replaying {}, mark procedure as failed", this, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the rollback here is very simple, just delete the snapshot directory, so we are safe to mark it as failure at any time, no PONR, good.

dispatched = false;
}

RegionStates regionStates = env.getAssignmentManager().getRegionStates();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the table is disabled? We will not use SnapshotProcedure for disabled table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If table is disabled, we will derectly jump into state SNAPSHOT_SNAPSHOT_OFFLINE_REGIONS. However I didn't make disabled table snapshot distributed, just do the work on master side. It may be slow but it will not hurt the availability if table is disabled.
What do you think ?

@Override
protected void complete(MasterProcedureEnv env, Throwable error) {
if (error != null) {
Throwable realError = error.getCause();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this trick here? We will always wrap the actual exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, on the master side we will wrap this into a RemoteProcedureException.

}

private Optional<ServerName> newTargetServer(MasterProcedureEnv env) {
List<ServerName> onlineServers =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have implemented a ReservoirSample class in hbase-common, you can this this class to do random selection, without copying all the online servers out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will fix it. Thank you.

@frostruan
Copy link
Contributor Author

Really thanks for taking time review this PR. I am very happy to see someone still paying attention to this. @Apache9

Recently I realized that some of my previous ideas are problematic. For example

  1. The execution of state SNAPSHOT_CONSOLIDATE_SNAPSHOT in SnapshotProcedure is not idempotent.
  2. If the snapshot is corrupted, the SnapshotVerifyProcedure will get the parent SnapshotPRocedure and mark parent procedure Failed. This may be not ProcedureV2 style (I am not sure, maybe we can implement this in a better way)

I have to say that this work is more complicated than I expected. I should post a design document first and split it into some sub tasks. This PR was submitted a little too early. I will post a design doc as soon as possible. Thanks again. @Apache9

@Apache9
Copy link
Contributor

Apache9 commented Nov 27, 2021

It will be good to have a design doc first~

@frostruan
Copy link
Contributor Author

frostruan commented Nov 28, 2021

hi @Apache9

here is a simple doc, would you mind taking a look ?

https://docs.google.com/document/d/1Il_PB1SenXGr1-mmCIWEogxEMeGZe2fpuN3bMbjqiGI/edit

@Apache9
Copy link
Contributor

Apache9 commented Dec 5, 2021

hi @Apache9

here is a simple doc, would you mind taking a look ?

https://docs.google.com/document/d/1Il_PB1SenXGr1-mmCIWEogxEMeGZe2fpuN3bMbjqiGI/edit

Suggest you send an email to the dev list to let more people review the design doc~

@frostruan
Copy link
Contributor Author

hi @Apache9
here is a simple doc, would you mind taking a look ?
https://docs.google.com/document/d/1Il_PB1SenXGr1-mmCIWEogxEMeGZe2fpuN3bMbjqiGI/edit

Suggest you send an email to the dev list to let more people review the design doc~

ok. Thank you very much.

@frostruan frostruan closed this Dec 5, 2021
@frostruan frostruan deleted the HBASE-26323 branch December 5, 2021 14:57
@frostruan
Copy link
Contributor Author

hi @Apache9
here is a simple doc, would you mind taking a look ?
https://docs.google.com/document/d/1Il_PB1SenXGr1-mmCIWEogxEMeGZe2fpuN3bMbjqiGI/edit

Suggest you send an email to the dev list to let more people review the design doc~

hi @Apache9
I followed your suggestion and send an email to dev@hbase.apache.org using my qq mailbox and gmail mailbox respectively, but it seems that something went wrong and the email was not sent successfully. Do you have any suggestions that can help me find out the reason ?

Thanks.

@Apache9
Copy link
Contributor

Apache9 commented Dec 6, 2021

Have you subscribed to the mailing list?

@frostruan
Copy link
Contributor Author

Have you subscribed to the mailing list?

yes. And here is the email I sent .....

image

@Apache9
Copy link
Contributor

Apache9 commented Dec 6, 2021

Strange...

@frostruan
Copy link
Contributor Author

never mind. I'll try other ways. Really thanks for your suggestions. @Apache9

@frostruan
Copy link
Contributor Author

Is it because only committers have the authority to send dev@hbase.apache.org emails? As a user, should I send emails to user@hbase.apache.org ?

@Apache9
Copy link
Contributor

Apache9 commented Dec 7, 2021

No, there is no such limitation. The only possible condition is whether you have subscribed to the mailing list...

Anyway, you could send the content to me first and then I can help posting it to the mailing-list...

Thanks

@frostruan
Copy link
Contributor Author

Really thanks for your patience @Apache9 . anyway, I want to say

Hi all,

As we all know, currently the snapshot in hbase has a few limitations, so I want to propose a proc-v2 implementation of snapshot.

Here are some related links.

jira
https://issues.apache.org/jira/browse/HBASE-26323

design doc
https://docs.google.com/document/d/1Il_PB1SenXGr1-mmCIWEogxEMeGZe2fpuN3bMbjqiGI/edit

the initial implementation
#3920

If you are interested, please take a look in your free time. Looking forward your advice and feedback.

Thanks.

@Apache9
Copy link
Contributor

Apache9 commented Dec 7, 2021

Email sent, could you please check whether you can see the email?

Thanks.

@frostruan
Copy link
Contributor Author

yes, I can. Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants