Skip to content

HBASE-28584 RS SIGSEGV under heavy replication load #6124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

apurtell
Copy link
Contributor

@apurtell apurtell commented Jul 28, 2024

Clone the cells that are used to apply mutations on the local cluster. Some operations may still be in flight even as we fail to apply some other in-flight mutations and trigger failure handling including a release of the buffer underlying the cellScanner that is sourcing the cells.

Deep clone the cells that are used to apply mutations on the local cluster.
Some operations may still be in flight even as we fail to apply some other
in-flight mutations and trigger failure handling including a release of the
buffer underlying the cellScanner that is sourcing the cells.
@apurtell
Copy link
Contributor Author

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 4m 22s master passed
+1 💚 compile 4m 30s master passed
+1 💚 checkstyle 1m 8s master passed
+1 💚 spotbugs 2m 42s master passed
+1 💚 spotless 0m 59s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 36s the patch passed
+1 💚 compile 4m 16s the patch passed
+1 💚 javac 4m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 4s the patch passed
+1 💚 spotbugs 2m 46s the patch passed
+1 💚 hadoopcheck 12m 56s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 21s The patch does not generate ASF License warnings.
48m 41s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6124
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 30055f172d95 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e0a3162
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in to fix the crash issue first.

IIRC, we have a similar problem when async wal is enabled. The rpc call will finish before we actually finish the WAL writing and cause we write out corrupt wal entries.

Let me check how we deal with the problem there. I guess the same trick can be used here too.

Thanks @apurtell for the analyzing.

@Apache9
Copy link
Contributor

Apache9 commented Jul 28, 2024

OK, in ServerCall class, we have a retainByWAL method, where we will count the extra references of the ServerCall, mainly the CellScanners.

I think we can just change the method to retain, which means we want to retain it for other usage even after the rpc call is done, and also use it here.

Anyway, since the current PR has been well tested in producation, I think we can apply it first, and open another issue for optimizing.

Thanks.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 4m 11s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 9s Maven dependency ordering for branch
+1 💚 mvninstall 4m 33s master passed
+1 💚 compile 1m 43s master passed
+1 💚 javadoc 1m 15s master passed
+1 💚 shadedjars 6m 12s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 0s the patch passed
+1 💚 compile 1m 21s the patch passed
+1 💚 javac 1m 21s the patch passed
+1 💚 javadoc 0m 47s the patch passed
+1 💚 shadedjars 5m 20s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 2m 29s hbase-common in the patch passed.
+1 💚 unit 229m 24s hbase-server in the patch passed.
265m 42s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6124
Optional Tests javac javadoc unit compile shadedjars
uname Linux 265f2a29681d 5.4.0-177-generic #197-Ubuntu SMP Thu Mar 28 22:45:47 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e0a3162
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/testReport/
Max. process+thread count 5172 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit though

Comment on lines +2265 to +2267
* Deep clones the given cell if the cell supports deep cloning
* @param cell the cell to be cloned
* @return the cloned cell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: javadoc to include @throws

@Apache9
Copy link
Contributor

Apache9 commented Sep 4, 2024

Any updates here?

Thanks.

@Apache9
Copy link
Contributor

Apache9 commented Sep 17, 2024

I think we should fix this in newer releases.

If you are all OK, I could try to implement the reference counting way to solve the problem.

@apurtell @virajjasani Thoughts?

Thanks.

@apurtell
Copy link
Contributor Author

I thought refcounting would be complex but am not opposed to it as a different solution. When and if we have that we could remove the copying.

@apurtell
Copy link
Contributor Author

We would not need this change if #6263 solves the problem instead.

@apurtell
Copy link
Contributor Author

Fixed by #6263

@apurtell apurtell closed this Sep 20, 2024
@apurtell apurtell deleted the HBASE-28584 branch September 20, 2024 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants