Skip to content

Commit

Permalink
Merge pull request delimitrou#223 from przmk0/redis_cluster_socialnet
Browse files Browse the repository at this point in the history
(feat) Extend support for Redis Cluster - include post-install health check and dedicated svc config flag
  • Loading branch information
cdelimitrou authored Aug 9, 2022
2 parents 6319b6e + 1e878e3 commit b4bf5f3
Show file tree
Hide file tree
Showing 13 changed files with 159 additions and 41 deletions.
12 changes: 8 additions & 4 deletions socialNetwork/config/service-config.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,16 @@
"addr": "user-timeline-redis",
"timeout_ms": 10000,
"port": 6379,
"connections": 512
"connections": 512,
"use_cluster": 0
},
"social-graph-redis": {
"keepalive_ms": 10000,
"addr": "social-graph-redis",
"timeout_ms": 10000,
"port": 6379,
"connections": 512
"connections": 512,
"use_cluster": 0
},
"post-storage-service": {
"keepalive_ms": 10000,
Expand All @@ -63,7 +65,8 @@
"addr": "compose-post-redis",
"timeout_ms": 10000,
"port": 6379,
"connections": 512
"connections": 512,
"use_cluster": 0
},
"user-timeline-mongodb": {
"keepalive_ms": 10000,
Expand Down Expand Up @@ -114,7 +117,8 @@
"addr": "home-timeline-redis",
"timeout_ms": 10000,
"port": 6379,
"connections": 512
"connections": 512,
"use_cluster": 0
},
"compose-post-service": {
"keepalive_ms": 10000,
Expand Down
53 changes: 51 additions & 2 deletions socialNetwork/helm-chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ Detailed mcrouter overview can be found [here](https://engineering.fb.com/2014/0
### mcrouter configuration:
Mcrouter configuration is stored in dedicated `ConfigMap` - default installation generates broken config file - MemcacheD server list is not matching actual MemcacheD pods. To overcome this issue dedicated `post-install` hook was developed to fix setup and also give an option to customize mcrouter behaviour.

Implementation of the hook and mcrouter config template can be found in `templates/mcrouter` folder.
Implementation of the hook and mcrouter config template can be found in `templates/hooks/mcrouter` folder.

Mcrouter config template - default configuration enables replication for single MemcacheD pool:
```json
Expand Down Expand Up @@ -377,9 +377,58 @@ The are two ways to deploy redis:

1. **Default setup** (standalone) - single replica of redis is deployed for each service (similar to mongoDB). Scaling is not possible.

2. **Sharded version** - services instead of connecting to standalone redis instances, they connect to a redis cluster with one master node and N slaves. The data is sharded across multiple nodes. Currently reading and writing datta is handled by the master. To improve the avaialability by writing only via master and reading via replicas, changes need to be applied to the source code.
2. **Clustered version** - multi-master setup improving read/write performance, it requires PV provisioner to be present in the cluster. More details and documentation can be found in [Redis Cluster helm chart documentation](https://github.com/bitnami/charts/tree/master/bitnami/redis-cluster)

![Redis Cluster deployment](https://raw.githubusercontent.com/bitnami/charts/master/bitnami/redis-cluster/img/redis-cluster-topology.png "Redis Cluster deployment")

Helm install output:
```bash
helm install dsb socialnetwork --timeout 10m0s
Pod redis-cluster-readiness-hook pending
Pod redis-cluster-readiness-hook pending
Pod redis-cluster-readiness-hook pending
Pod redis-cluster-readiness-hook running
Pod redis-cluster-readiness-hook succeeded
NAME: dsb
LAST DEPLOYED: Tue Aug 9 11:03:20 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
```

After helm chart installation dedicated post-install hook starts and observes Redis Cluster deployment and waits until it's fully configured:

```bash
2022-08-09 09:06:03,599 [ERROR] Could not connect to Redis Cluster. Sleeping for 5 seconds
Traceback (most recent call last):
File "/tmp/redis_readiness_check.py", line 16, in <module>
redis_cluster_connection = RedisCluster(host=redis_cluster_uri)
File "/usr/local/lib/python3.10/site-packages/redis/cluster.py", line 560, in __init__
self.nodes_manager = NodesManager(
File "/usr/local/lib/python3.10/site-packages/redis/cluster.py", line 1267, in __init__
self.initialize()
File "/usr/local/lib/python3.10/site-packages/redis/cluster.py", line 1558, in initialize
raise RedisClusterException(
redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node.
2022-08-09 09:06:08,659 [INFO] cluster 10.250.228.147:6379 -> cluster state: ok
2022-08-09 09:06:08,660 [INFO] cluster 10.250.228.137:6379 -> cluster state: ok
2022-08-09 09:06:08,660 [INFO] cluster 10.250.230.194:6379 -> cluster state: ok
2022-08-09 09:06:08,660 [INFO] cluster 10.250.230.234:6379 -> cluster state: ok
2022-08-09 09:06:08,660 [INFO] cluster 10.250.230.191:6379 -> cluster state: ok
2022-08-09 09:06:08,660 [INFO] cluster 10.250.224.88:6379 -> cluster state: ok
2022-08-09 09:06:13,665 [INFO] Redis Cluster is configured and ready to use!
```

Implementation of the hook can be found in `templates/hooks/redis-cluster` folder.

### Usage:
```bash
helm install RELEASE_NAME HELM_CHART_REPO_PATH --set global.redis.cluster.enabled=true,global.redis.standalone.enabled=false --timeout 10m0s
```

### Persistance:
After uninstalling helm chart, PVCs created by Redis Cluster installation won't be removed. This can be removed manually by running following shell script:
```bash
for p in $(kubectl get pvc -o name -l app.kubernetes.io/name=redis-cluster); do kubectl delete $p; done
```
8 changes: 4 additions & 4 deletions socialNetwork/helm-chart/socialnetwork/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ dependencies:
- name: mcrouter
repository: https://evryfs.github.io/helm-charts/
version: 0.3.0
- name: redis
- name: redis-cluster
repository: https://charts.bitnami.com/bitnami
version: 16.12.2
digest: sha256:49bf1c70e8d275ca63fa8c986620c0da6364c5fc9037c65a3f800e80d25b7188
generated: "2022-06-30T14:03:51.123982+02:00"
version: 7.6.3
digest: sha256:0322995d5d8682baac63a4af0d76b03179ead7989ff131bcd08409148a25ec37
generated: "2022-07-22T16:14:32.803085+02:00"
4 changes: 2 additions & 2 deletions socialNetwork/helm-chart/socialnetwork/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ dependencies:
version: 0.3.0
condition: global.memcached.cluster.enabled
repository: https://evryfs.github.io/helm-charts/
- name: redis
version: 16.12.2
- name: redis-cluster
version: 7.6.3
condition: global.redis.cluster.enabled
repository: https://charts.bitnami.com/bitnami
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@
{{ .Release.Name }}-mcrouter
{{- end }}

# this needs to be extended with redis-replicas to spread reads across all instances, requires code changes on the service level
{{- define "redis-cluster.connection" }}
{{ .Release.Name }}-redis-master
{{ .Release.Name }}-redis-cluster
{{- end}}

{{- define "socialnetwork.templates.other.service-config.json" }}
Expand All @@ -33,7 +32,8 @@
"port": 6379,
"connections": 512,
"timeout_ms": 10000,
"keepalive_ms": 10000
"keepalive_ms": 10000,
"use_cluster": {{ ternary 1 0 .Values.global.redis.cluster.enabled}}
},
"write-home-timeline-service": {
"addr": "write-home-timeline-service",
Expand All @@ -55,7 +55,8 @@
"port": 6379,
"connections": 512,
"timeout_ms": 10000,
"keepalive_ms": 10000
"keepalive_ms": 10000,
"use_cluster": {{ ternary 1 0 .Values.global.redis.cluster.enabled}}
},
"compose-post-service": {
"addr": "compose-post-service",
Expand All @@ -69,7 +70,8 @@
"port": 6379,
"connections": 512,
"timeout_ms": 10000,
"keepalive_ms": 10000
"keepalive_ms": 10000,
"use_cluster": {{ ternary 1 0 .Values.global.redis.cluster.enabled}}
},
"user-timeline-service": {
"addr": "user-timeline-service",
Expand All @@ -90,7 +92,8 @@
"port": 6379,
"connections": 512,
"timeout_ms": 10000,
"keepalive_ms": 10000
"keepalive_ms": 10000,
"use_cluster": {{ ternary 1 0 .Values.global.redis.cluster.enabled}}
},
"post-storage-service": {
"addr": "post-storage-service",
Expand All @@ -112,7 +115,7 @@
"connections": 512,
"timeout_ms": 10000,
"keepalive_ms": 10000,
"binary_protocol": {{ ternary 0 1 .Values.global.memcached.cluster.enabled}}
"binary_protocol": {{ ternary 1 0 .Values.global.memcached.cluster.enabled}}
},
"unique-id-service": {
"addr": "unique-id-service",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{{- if .Values.global.redis.cluster.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-readiness-script
data:
redis_readiness_check.py: |
import logging
from time import sleep
from typing import Optional
from redis import RedisCluster
from redis.exceptions import RedisClusterException
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[logging.StreamHandler()])
log = logging.getLogger()
redis_cluster_uri: str = "{{ .Release.Name }}-redis-cluster"
redis_cluster_is_ready: bool = False
redis_cluster_connection: Optional[RedisCluster] = None
while True:
try:
redis_cluster_connection = RedisCluster(host=redis_cluster_uri)
redis_cluster_connection.ping()
break
except Exception as ex:
log.exception(f"Could not connect to Redis Cluster. Sleeping for 5 seconds")
sleep(5)
assert isinstance(redis_cluster_connection, RedisCluster)
while not redis_cluster_is_ready:
redis_cluster_is_ready = True
cluster_info = redis_cluster_connection.cluster_info(target_nodes=RedisCluster.ALL_NODES)
for node, status in cluster_info.items():
cluster_state = status.get("cluster_state")
log.info(f"cluster {node} -> cluster state: {cluster_state}")
if not cluster_state or cluster_state != "ok":
redis_cluster_is_ready = False
sleep(5)
log.info("Redis Cluster is configured and ready to use!")
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{{- if .Values.global.redis.cluster.enabled }}
apiVersion: v1
kind: Pod
metadata:
name: redis-cluster-readiness-hook
annotations:
"helm.sh/hook": "post-install"
spec:
containers:
- name: post-install-container
image: python
imagePullPolicy: Always
command: ['sh', '-c', 'python -m pip install redis && python /tmp/redis_readiness_check.py']
volumeMounts:
- name: redis-readiness-script
mountPath: /tmp
volumes:
- name: redis-readiness-script
configMap:
name: redis-readiness-script
restartPolicy: Never
terminationGracePeriodSeconds: 0
{{- end }}
22 changes: 7 additions & 15 deletions socialNetwork/helm-chart/socialnetwork/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,20 +51,12 @@ mcrouter:
replicaCount: 3
mcrouterCommandParams.port: *memcached-cluster-port

redis:
auth:
enabled: false
architecture: replication
replica:
replicaCount: 5
persistence:
enabled: false
master:
persistence:
redis-cluster:
usePassword: false
redis:
# default readiness / liveness probes are causing issues, rising timeouts/delays might help
# cluster health check will be handled by post-commit hook
readinessProbe:
enabled: false
startupProbe:
initialDelaySeconds: 60
timeoutSeconds: 30
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 30
enabled: false
6 changes: 3 additions & 3 deletions socialNetwork/src/HomeTimelineService/HomeTimelineService.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ int main(int argc, char *argv[]) {
}

int port = config_json["home-timeline-service"]["port"];

int redis_cluster_config_flag = config_json["home-timeline-redis"]["use_cluster"];
int post_storage_port = config_json["post-storage-service"]["port"];
std::string post_storage_addr = config_json["post-storage-service"]["addr"];
int post_storage_conns = config_json["post-storage-service"]["connections"];
Expand Down Expand Up @@ -86,7 +86,7 @@ int main(int argc, char *argv[]) {
std::shared_ptr<TServerSocket> server_socket =
get_server_socket(config_json, "0.0.0.0", port);

if (redis_cluster_flag) {
if (redis_cluster_flag || redis_cluster_config_flag) {
RedisCluster redis_cluster_client_pool =
init_redis_cluster_client_pool(config_json, "home-timeline");
TThreadedServer server(
Expand All @@ -97,7 +97,7 @@ int main(int argc, char *argv[]) {
server_socket, std::make_shared<TFramedTransportFactory>(),
std::make_shared<TBinaryProtocolFactory>());

LOG(info) << "Starting the home-timeline-service server...";
LOG(info) << "Starting the home-timeline-service server with Redis Cluster support...";
server.serve();
} else {
Redis redis_client_pool =
Expand Down
6 changes: 4 additions & 2 deletions socialNetwork/src/SocialGraphService/SocialGraphService.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ int main(int argc, char *argv[]) {
int user_timeout = config_json["user-service"]["timeout_ms"];
int user_keepalive = config_json["user-service"]["keepalive_ms"];

int redis_cluster_config_flag = config_json["social-graph-redis"]["use_cluster"];

mongoc_client_pool_t *mongodb_client_pool =
init_mongodb_client_pool(config_json, "social-graph", mongodb_conns);

Expand Down Expand Up @@ -96,7 +98,7 @@ int main(int argc, char *argv[]) {
std::shared_ptr<TServerSocket> server_socket =
get_server_socket(config_json, "0.0.0.0", port);

if (redis_cluster_flag) {
if (redis_cluster_flag || redis_cluster_config_flag) {
RedisCluster redis_cluster_client_pool =
init_redis_cluster_client_pool(config_json, "social-graph");
TThreadedServer server(
Expand All @@ -106,7 +108,7 @@ int main(int argc, char *argv[]) {
&user_client_pool)),
server_socket, std::make_shared<TFramedTransportFactory>(),
std::make_shared<TBinaryProtocolFactory>());
LOG(info) << "Starting the social-graph-service server with Resis cluster...";
LOG(info) << "Starting the social-graph-service server with Resis Cluster support...";
server.serve();
} else {
Redis redis_client_pool =
Expand Down
6 changes: 4 additions & 2 deletions socialNetwork/src/UserTimelineService/UserTimelineService.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ int main(int argc, char *argv[]) {
int mongodb_conns = config_json["user-timeline-mongodb"]["connections"];
int mongodb_timeout = config_json["user-timeline-mongodb"]["timeout_ms"];

int redis_cluster_config_flag = config_json["user-timeline-redis"]["use_cluster"];

auto mongodb_client_pool =
init_mongodb_client_pool(config_json, "user-timeline", mongodb_conns);

Expand Down Expand Up @@ -100,7 +102,7 @@ int main(int argc, char *argv[]) {
std::shared_ptr<TServerSocket> server_socket =
get_server_socket(config_json, "0.0.0.0", port);

if (redis_cluster_flag) {
if (redis_cluster_flag || redis_cluster_config_flag) {
RedisCluster redis_client_pool =
init_redis_cluster_client_pool(config_json, "user-timeline");
TThreadedServer server(std::make_shared<UserTimelineServiceProcessor>(
Expand All @@ -110,7 +112,7 @@ int main(int argc, char *argv[]) {
server_socket,
std::make_shared<TFramedTransportFactory>(),
std::make_shared<TBinaryProtocolFactory>());
LOG(info) << "Starting the user-timeline-service server...";
LOG(info) << "Starting the user-timeline-service server with Redis Cluster support...";
server.serve();
} else {
Redis redis_client_pool =
Expand Down

0 comments on commit b4bf5f3

Please sign in to comment.