-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMDB corruption recovery -- follow-up #3880
LMDB corruption recovery -- follow-up #3880
Conversation
This pull request introduces 10 alerts and fixes 25 when merging 3aebce9d951c23b54a9e52239a61dae44a8a9546 into 4ebcf91 - view on LGTM.com new alerts:
fixed alerts:
|
a4eb259
to
3f1a347
Compare
This pull request introduces 12 alerts and fixes 25 when merging 3f1a347e7c043ff64f010cb490c3f335c422045e into e9b38c1 - view on LGTM.com new alerts:
fixed alerts:
|
3f1a347
to
5f1b7b7
Compare
5f1b7b7
to
6a9bf01
Compare
b7c1c67
to
300930a
Compare
This pull request introduces 11 alerts and fixes 1 when merging 300930a65cdf649862fd8434c907ee474e6ba982 into 7b2938d - view on LGTM.com new alerts:
fixed alerts:
|
137b4a5
to
e370fa5
Compare
This pull request introduces 11 alerts and fixes 1 when merging e370fa51f9ef7c16314cfb6c622effe8f3535496 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
e370fa5
to
31f0b45
Compare
C doesn't support exceptions so in the LMDB corruption handling callback we have no way to backtrack to the code that run the LMDB operation that hit the corruption and tell it that things went wrong. Our only chance is to exit the process. To avoid our exit handlers from potentially messing up the freshly-repaired LMDB file, we have to either use _exit() or make sure the LMDB file is not touched by the exit handlers. This change implements the latter. We also use special exit codes to signal that LMDB corruption was detected and either repaired or failed to be reparied. Ticket: CFE-3127 Changelog: None
We need to make sure the file is repaired for future use.
31f0b45
to
b5ee6a3
Compare
This pull request introduces 11 alerts and fixes 1 when merging b5ee6a332f818d784eb1c8834e1f7c384a375586 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
b5ee6a3
to
e36cf72
Compare
This pull request introduces 11 alerts and fixes 1 when merging e36cf72ce436b86825318bbf98a39c43c0ad0fe7 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good! (Please make sure it passes jenkins before merging)
e36cf72
to
2e12ad9
Compare
This pull request introduces 11 alerts and fixes 1 when merging 2e12ad980a547d10324d4e2ccca762e5ec467fc3 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
2e12ad9
to
9e3c789
Compare
This pull request introduces 11 alerts and fixes 1 when merging 9e3c7899fa39388114310beff99a6b3171aa8a23 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
SIGBUS means unaligned memory access or an attempt to access area outside of mmap(). In most cases for us this means an LMDB corruption. Thus if a process gets SIGBUS, it touches a flag file that triggers a DB repair attempt in the next agent or cf-execd run. Ticket: CFE-3127 Changelog: Received SIGBUS now triggers a repair of local DBs
They are more reusable there. Moved there in NorthernTechHQ/libntech@996300c.
repair_lmdb_file() is the function that repairs the given LMDB file and thus it should be the same function recording the repair timestamp. However, some complexities are out of its scope. Let's make it either take a file descriptor for the repair timestamp file, assuming that file locking and such stuff is taken care of, or do the simple repair timestamp recording on its own. Also treat its return code in a more appropriate way -- only a real failure should be considered failure. If the data recovery/replication fails, but the LMDB file is moved out of the way (removed), it should be considered as successful repair.
File locking in libntech is now a bit more abstract and versatile.
9e3c789
to
3817090
Compare
This pull request introduces 10 alerts and fixes 1 when merging 3817090 into 36379af - view on LGTM.com new alerts:
fixed alerts:
|
@cf-bottom run this throught he new shiny jenkins, please! Incl. exotics. |
Alright, I triggered a build: (with exotics) Jenkins: https://ci.cfengine.com/job/pr-pipeline/4006/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-4006/ |
@cf-bottom One more jenkins, with exotics. |
Sure, I triggered a build: (with exotics) Jenkins: https://ci.cfengine.com/job/pr-pipeline/4017/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-4017/ |
Follows #3873
Merge together:
#3880
https://github.com/cfengine/enterprise/pull/556
NorthernTechHQ/libntech#27