This repository demonstrates the working of Redis with Sentinel for high availability, Redis Exporter for monitoring, and Pub/Sub systems. The setup includes Redis master and slave instances, Sentinel for failover management, HAProxy for load balancing, Prometheus and Grafana for monitoring, and simple publisher and subscriber applications.
The architecture of this demo includes the following components:
- Redis Master: Primary Redis instance.
- Redis Slaves: Two Redis slave instances replicating from the master.
- Redis Sentinel: Three Sentinel instances monitoring the Redis master and slaves.
- HAProxy: Load balancer to distribute traffic among Redis instances.
- Redis Exporter: Exporter to gather Redis metrics.
- Prometheus: Monitoring system to collect metrics from Redis Exporter.
- Grafana: Dashboard for visualizing the metrics collected by Prometheus.
- Publisher: A simple Go application to publish messages.
- Subscriber: A simple Go application to subscribe to messages.
To start the services, run:
make start
This command will build and start all the services defined in the docker-compose.yaml
file.
To stop the services, run:
make stop
To stop and remove all containers, networks, and volumes, run:
make clean
To insert mock data into Redis, run:
make redis-mock-data
To benchmark the Redis setup, run:
make redis-benchmark
Prometheus and Grafana are used for monitoring the Redis instances.
- Prometheus is accessible at
http://localhost:9090
- Grafana is accessible at
http://localhost:3000
(default login:admin
/admin
)
What happens when Redis master fails?
When the Redis master fails, the Sentinels detect the failure. Sentinels are configured to monitor the Redis instances. Once the master is down, Sentinels start a failover process to elect a new master among the slaves.
How does Sentinel elect a new master?
- Failure Detection: Sentinels continuously monitor the Redis master. When a Sentinel cannot communicate with the master for a specified time (SENTINEL_DOWN_AFTER), it marks the master as down.
- Quorum: The failure is confirmed if a majority of Sentinels (based on SENTINEL_QUORUM) agree that the master is down.
- Failover: Sentinels then select one of the slaves to promote to master. This new master is chosen based on the Sentinel’s failover election process, which considers factors like the latest replication offset.
- Configuration Update: Sentinels update their configurations to point to the new master, and the remaining slaves are reconfigured to replicate from the new master.
How does the system handle a network partition?
If a network partition occurs, the Sentinels might split into two groups, each believing the other group is down. This can lead to a scenario where both groups try to elect a new master. To avoid split-brain scenarios:
- Sentinels require a majority quorum to elect a new master.
- HAProxy will route traffic to the correct master based on the current Sentinel configurations.
What happens if there is a Sentinel quorum failure?
If there are not enough Sentinels to form a quorum (e.g., if two out of three Sentinels fail), failover cannot proceed:
- The system remains in read-only mode if the master is down.
- Monitoring and alerting should be in place to notify administrators to manually intervene and restore Sentinel quorum.
What happens when a Sentinel instance is restarted?
When a Sentinel instance restarts, it performs the following steps:
- Reconnection: The restarted Sentinel reconnects to the Redis instances and other Sentinel nodes.
- Synchronization: It synchronizes its state with other Sentinel nodes to get updated information about the current master and slaves.
- Monitoring Resumption: The restarted Sentinel resumes its monitoring duties, participating in quorum and failover processes as needed.
What happens when a Redis slave fails?
When a Redis slave fails:
- Failure Detection: Sentinels detect the failure and mark the slave as down.
- Master Unaffected: The Redis master continues to operate normally, and the remaining slave(s) continue replicating from the master.
- Reconfiguration: Once the failed slave is back online, it reconnects to the master and resynchronizes its data.
What happens when HAProxy fails?
When HAProxy fails:
- Service Disruption: Clients lose the ability to connect to Redis through the HAProxy load balancer.
- Manual Intervention: An administrator needs to restart the HAProxy service to restore load balancing.
- High Availability: In a production environment, running multiple HAProxy instances with failover mechanisms can mitigate this risk.
What happens when the original Redis master recovers after a failover?
When the original Redis master recovers after a failover:
- Rejoin as Slave: The recovered master does not automatically become the master again. Instead, it joins the cluster as a slave of the current master.
- Data Synchronization: The recovered instance synchronizes its data with the new master to ensure consistency.