-
Notifications
You must be signed in to change notification settings - Fork 45
WPB-19318: Ensure high-availability of the postgress cluster #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…sive docs - Consolidate PostgreSQL configuration into single unified template - Fix split-brain detection script (correct 'rouge' to 'rogue' typo) - Add detailed HA features documentation with failover validation - Include monitoring & event system documentation - Add node_id and priority configuration parameters - Add official repmgr and PostgreSQL documentation references - Improve deployment commands and monitoring checks - Enhance split-brain protection with advanced features
- Remove duplicate HA features list from Key Concepts section - Remove duplicate monitoring system section from Configuration Options - Fix incorrect numbering in monitoring commands (5 → 8) - Consolidate monitoring information into single comprehensive section
- PostgreSQL cluster runs independently, not integrated with endpoint-manager - Explain postgres-endpoint-manager as separate component that monitors cluster externally - Emphasize independent operation of cluster vs endpoint management
dumping status of services and logs
|
|
repmgr brings back postgresql service if it is found stopped
|
@sghosh23 we should leave a note in the postgresql documentation for maintenance of postgresql service, that it will require the repmgr to be stopped, otherwise, postgresql service can change during the maintenance. |
Can we please documentation on how to activate a postgresql service back which was masked by the detect-rogue-primary.timer? Also, lets mention the expected downtime for an application about 4.5 mins when failover happens. |
As we already tested this part. I will add in the doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my testing (logged on the ticket), it looks good to me.
4a7d57b
to
3bef7d0
Compare
|
Please checkout the system design doc for more info: https://wearezeta.atlassian.net/wiki/spaces/CUSSOPS/pages/2088108112/PostgreSQL+High+Availability+System+Design
Change type
Basic information
Testing
Tracking
changelog.d
Knowledge Transfer
Motivation
Objective
Reason
Use case