[feature] High Availability #144

gregkeys · 2018-08-08T21:48:44Z

We are currently using crossbar in production but would like to switch to something that offers high availability. Crossbar has HA however it is prohibitively expensive at around $14,000 per node

We run multi-tenanted clusters where each namespace represents a customer website, within each namespace is a deployment of crossbar.

All of our services are capable of scaling up and down however at 65,000 active connections Im expecting that we will begin to see problems with services and connected users as they begin to loose connectivity.

if the router crashes and restarts there is no failover strategy everything just goes into a retry cycle until crossbar comes back online

Nexus looks like a good option with its ability to drop sessions from the que, this could prevent the restarts but still leaves to question how do we scale up to millions of active connections, we'd like to scale up our wamp router so that it can spread the load accross multiple instances of the deployment.

We'd also like to have instances of nexus running on multiple nodes in the event that we loose a node we dont loose connectivity

gammazero · 2018-08-09T09:36:30Z

I have been considering what it would take to provide HA for a cluster of nodes running WAMP routers. Here is a very rough design concept that tries to address what you are asking for. I would be interested in hearing feedback.
https://github.com/gammazero/nexus/wiki/High-Availability

martin31821 · 2018-08-11T08:38:37Z

We also thought about HA within WAMP, but came up with a slightly different approach.

Our approach was to minimize the inter-router-communication required to maintain good performance and reliability.

We split our services into smaller "application features", and wanted to use one "backend router" per "application feature". The backend router is where the backend services connect to.
Then we can go on and freely scale the users by adding load-balancing routers infront of the different backend routers.
This approach has the following advantages:

We minimize the communication between routers
We can perform nearly-zero-downtime updates
When a router restarts, not the complete application fails for all users, either
- the application fails for some users (in case of a frontend router restart)
- a part of the application fails for all users (in case of a backend router restart)

However, I agree with @gammazero that some kind of federation service would certainly be required, regardless of the chosen implementation strategy.

Additionally, I think nexus will need to open up some internal APIs, since the federation service would require deep interaction with the router itself.

/cc @johannwagner @fin-ger

haizaar · 2018-08-15T07:54:16Z

@gregkeys can you please provide some info/links about Crossbar HA mode?

haizaar · 2018-08-16T01:46:57Z

BTW, who ever is in search for multi-node router for scale/HA purposes, Wiola seems to fit the bill:

gregkeys · 2018-08-16T05:03:31Z

@gammazero Im still considering you're proposal, at first glance it looks like a solid plan of action

@martin31821 are you building custom routers with nexus to accomplish this? if so where is the strong interconnect taking place is that built into the frontend routers?

@haizaar here is the link where crossbar mentions access to HA mode https://crossbario.com/products/enterprise-support/

martin31821 · 2018-08-16T15:26:48Z

@gregkeys

are you building custom routers with nexus to accomplish this? if so where is the strong interconnect taking place is that built into the frontend routers?

At the moment, we have autobahnkreuz , but it has no high-availability included yet.
My idea was to either include multiple local clients connecting to the backend routers (take a look at the autobahnkreuz source code) and use it to register/publish/subscribe or to have this functionality as seperated binaries and implement the connection there.

mjentsch · 2019-08-21T14:15:32Z

@gammazero how would you prevent subscription/registration loops using federation agents connecting as normal clients to other routers? There seems to be a problem that clientA connected to routerA subscribes to a topic. The federation agentA (on routerA) detects that and subscribes to that topic on routerB it is connected to. On routerB is federation agentB running who detects this subscription if we do not differentiate between clients and agents and subscribes to the topic on routerA is is connected to. clientA can unsubscribe from the topic but since agentB still is subscribed to the topic, agentA will not unsubscribe from it on routerB.
Having HA would be awesome... but all my drawings end up with unsatisfying results.

uttaravadina · 2020-05-07T06:13:01Z

@gammazero has there been any work in HA, Load Balancing and Horizontal Scaling?
I would like to contribute.

martin31821 · 2020-05-07T07:09:42Z

@u774r4v4d1n there has been some research, primarily regarding state synchronization in a distributed scenario (for availability, not for scaling).

cc'ing @fin-ger, his project https://github.com/fin-ger/building-a-distributed-wamp-router gives a nice overview of the results.

gammazero mentioned this issue Nov 3, 2020

Is any work being done to create a HA wamp client #218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] High Availability #144

[feature] High Availability #144

gregkeys commented Aug 8, 2018 •

edited

Loading

gammazero commented Aug 9, 2018

martin31821 commented Aug 11, 2018 •

edited

Loading

haizaar commented Aug 15, 2018

haizaar commented Aug 16, 2018

gregkeys commented Aug 16, 2018

martin31821 commented Aug 16, 2018

mjentsch commented Aug 21, 2019

uttaravadina commented May 7, 2020

martin31821 commented May 7, 2020

[feature] High Availability #144

[feature] High Availability #144

Comments

gregkeys commented Aug 8, 2018 • edited Loading

gammazero commented Aug 9, 2018

martin31821 commented Aug 11, 2018 • edited Loading

haizaar commented Aug 15, 2018

haizaar commented Aug 16, 2018

gregkeys commented Aug 16, 2018

martin31821 commented Aug 16, 2018

mjentsch commented Aug 21, 2019

uttaravadina commented May 7, 2020

martin31821 commented May 7, 2020

gregkeys commented Aug 8, 2018 •

edited

Loading

martin31821 commented Aug 11, 2018 •

edited

Loading