Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: replication redundancy #121

Closed
ridingrails opened this issue Jan 11, 2017 · 12 comments
Closed

Question: replication redundancy #121

ridingrails opened this issue Jan 11, 2017 · 12 comments

Comments

@ridingrails
Copy link

Great tool! I have a question about redundancy. When you set a follower for a master, what happens if the master host terminates? Is there any type of leader election among followers currently? Or do all the followers need to be programmed to follow a new master?

@ridingrails ridingrails changed the title Replication redundancy Question: replication redundancy Jan 11, 2017
@tidwall
Copy link
Owner

tidwall commented Jan 11, 2017

Hi @ridingrails,

In short it would need to be programmed.

Replication in Tile38 is very simple. A follower is just a readonly mirror of the leader. When new write commands are sent to a leader, the leader will forward the commands to each follower. The followers can serve readonly commands like GET and SEARCH, but cannot handle write commands like SET and DEL.

When a leader terminates with active followers, the followers simply wait until the leader returns. There aren't elections and it's not a true cluster like Raft.

To make it more redundant there would need to be a sentinel like service that watches the leader and knows of the followers. Then when a leader fails, the service would switch a follower to leader state by issuing a FOLLOW none command.

I hope this helps answer your question.

@ridingrails
Copy link
Author

Yes it does, thanks Josh!

@octete
Copy link

octete commented Aug 4, 2017

Hi @tidwall

Regarding this, I have noticed that when the master dies, the followers don't respond at all. Is this by design?
i.e.

127.0.0.1:9852> NEARBY fleet4 POINT 33.462 -112.268 3000
{"ok":true,"objects":[],"count":0,"cursor":0,"elapsed":"83.569µs"}
127.0.0.1:9852> NEARBY fleet4 POINT 33.462 -112.268 3000
(error) catching up to leader
127.0.0.1:9852> NEARBY fleet4 POINT 33.462 -112.268 3000
(error) catching up to leader
127.0.0.1:9852> NEARBY fleet4 POINT 33.462 -112.268 3000
(error) catching up to leader
127.0.0.1:9852> NEARBY fleet4 POINT 33.462 -112.268 3000
(error) catching up to leader

This is what happens when the master dies.
I was thinking that it would be desirable to have the followers reply to queries whilst the master if offling.

@tidwall
Copy link
Owner

tidwall commented Aug 4, 2017

Hi @octete,

I agree and I just pushed an update to the master branch which changes the behavior. Now a follower only needs to catch up with the leader one time to begin accepting reads.

This should address the case when a leader goes down the follower will continue to responding to reads.

If the leader dies and the follower server is restarted then the catching up to leader error will occur until the follower resyncs with leader, or the follower stops following with a FOLLOW no one command.

Let me know if this fixes the issue or if you have further questions.

@octete
Copy link

octete commented Aug 4, 2017

Hi @tidwall

That's enough for now! Thanks for the prompt response. 😁

@jbfarez
Copy link

jbfarez commented Jun 14, 2018

Hi @tidwall, sorry for "reopening" this case but I was just wondering if there is a way to "transform/promote" a follower to a leader?
If not, do you plan to design an high availability mechanism for tile38?

@tidwall
Copy link
Owner

tidwall commented Jun 14, 2018

Hi @jbfarez,

It's possible to promote the follower to leader by sending the follower a FOLLOW no one command. And then demote the leader by sending FOLLOW host port, pointing to the new leader.

Right now this is a manual step (or perhaps with a custom made script). I have plans on adding at automated failover in the future.

@mudit3774
Copy link

@tidwall we are planning to use Tile38 in production and we would like to check if there is a proposal around this so that we can implement the same.

Do we plan to use etcd or any other similar coordinator to achieve this? Do we plan to give some sort of "cluster-mode" and a simple cluster formation strategy?

Or do we plan to monitor the leader from the follower and in case of failover, follower assumes leadership and sends FOLLOW host port command to the failed leader? In this case, we will have to change the DNS to the new node so Tile38 should either publish an event or allow a callback registry.

The issue with a separate health check monitor would probably be monitoring and HA of the monitor itself.

Thoughts?

@tidwall
Copy link
Owner

tidwall commented Jul 2, 2019

@mudit3774 There's no official built-in support for HA.

I've heard that some people have had success using Redis Sentinel. It's possible to run the redis-server with the --sentinel flag and configure it to point to the Tile38 Leader instead of a Redis Master.

@RashadAnsari
Copy link

@mudit3774 There's no official built-in support for HA.

I've heard that some people have had success using Redis Sentinel. It's possible to run the redis-server with the --sentinel flag and configure it to point to the Tile38 Leader instead of a Redis Master.

Hi @tidwall

I created a docker image for Tile38 HA using Redis Sentinel.

https://github.com/RashadAnsari/tile38-ha

@Mukund2900
Copy link

Mukund2900 commented Apr 26, 2023

@tidwall @iwpnd
For anyone who wants a step by step guide. Here it is ->
(For now setup is on local change the host and ip based on needs) ->

  • spin up a tile 38 server -> tile38-server -d data1 this will start at port 9851
  • spin up another tile 38 server -> tile38-server -p 9010 -d data2 this will start at port 9010
  • open tile38-cli at port 9010 and enter -> FOLLOW 127.0.0.1 9851
  • on another terminal start redis sentinel with redis-sentinel sentinel.conf
    Here is the sentinel.conf -
sentinel monitor mymaster 127.0.0.1 9851 1
sentinel down-after-milliseconds mymaster 300
sentinel failover-timeout mymaster 1800
protected-mode no

Change the variables based on needs this one directly makes the slave master once master goes down.
Also as i have only 2 instances i.e. 1 slave and one master/leader value ahead of monitor mymaster is 1.
Now in your client implementation just change the way we connect with tile38. For e.g. in java ->

connect as follows (example in java client)-

    private RedisClient createRedisClient(String sentinelHost, int sentinelPort, String password , String masterId) {
        RedisURI redisURI = RedisURI.Builder.sentinel(sentinelHost, sentinelPort, masterId, password).withDatabase(0).build();
        return RedisClient.create(redisURI);
    }

Also you can check the tile38 java client for client side implementation.
It automatically discovers the new Leader at any point and waits for reconnection.

@Kilowhisky
Copy link
Contributor

Kilowhisky commented May 10, 2023

FYI for those in the C# land, StackExchange.Redis does does not work with sentinel(in tile38) as it appears to be trying to get replication information using the ROLE command from the tile38 server.

I will update this post once i find a workaround...

EDIT:
Looks like ROLE command is a large part of Sentinel. https://redis.io/docs/reference/sentinel-clients/
EDIT2:
Dropped bug: #686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants