Proxies & Config API #582

benbjohnson · 2014-02-19T23:21:25Z

Overview

This pull request adds the ability for nodes to join as proxies when the cluster size limit is reached.

Changes

Peer & Proxy Management added to Registry.
Mode added to PeerServer. This can be either PeerMode or ProxyMode.
JoinCommand sets overflow nodes as proxies.
RemoveCommand removes proxies in addition to peers.
Leader responses return X-Leader-Peer-URL & X-Leader-Client-URL headers.
New Error (402): Proxy Internal Error

Questions

How should we update the proxy urls?

I would like to minimize the internal state that proxies have to hold and make them as dumb as possible. Currently it retrieves the proxy url via the initial join command. It can also update the proxy urls if the leader changes but the original leader is still available. However, it doesn't handle a leader failure.

My thought is to simply have a new leader continually ping the proxies after leader change until all proxies have been notified.

The other option would be to maintain a cluster configuration within the proxies but that means that we're maintaining more state.

/cc @philips @xiangli-cmu

benbjohnson · 2014-02-25T17:11:52Z

I added the following peer endpoints:

GET /config
PUT /config

The endpoint receives and sends a JSON serialized ClusterConfig:

// Maintain a total of 9 peers, automatically promote after 30 minutes (1,800 seconds).
{"activeSize":9, "promoteDelay":1800}

I also added auto-promotion and auto-demotion when the configuration changes. Proxies are promoted when "peer count" is less than "active size". It does this by calling the POST /promote peer endpoint on the proxy. The proxy will reattempt to join the cluster and will start its Raft server.

Peers are demoted when the "peer count" is greater than "active size". It does this by issuing a DemoteCommand which converts the peer to a proxy.

The promotion/demotion is run out of a monitoring goroutine that checks that checks the active size and peer count. It's a quick check against the registry so there's little overhead. I still need to add peer activity monitoring but that'll hook in easily with the monitoring goroutine.

@philips @xiangli-cmu Can you guys give me a code review on this? Also, I don't see where the RemoveCommand is being flushed to the removed peer (in order to cause the peer to shutdown). Am I missing something?

benbjohnson · 2014-02-25T17:12:12Z

[SUCCESS] https://drone.io/github.com/benbjohnson/etcd/13

benbjohnson · 2014-02-28T16:26:25Z

@philips @xiangli-cmu I added a separate goroutine to monitor for activity on the peers and to demote dead peers. There's another goroutine that monitors the cluster size and will promote/demote to move the cluster size to match the ClusterConfig.ActiveSize.

Two goroutines may sound like too much but they're really two separate problems. This setup allows us to manage the cluster configuration settings separately from the dead nodes and it works pretty nicely.

The proxy work and configuration api should all be there. Can you guys review this pull request and give me feedback?

Also, on a side note, it turns out that if you double kill a process on Mac OS X then you get the blue screen of death (except it's gray on the Mac). I filed a bug against Go 1.2.

benbjohnson · 2014-02-28T16:26:52Z

[SUCCESS] https://drone.io/github.com/benbjohnson/etcd/16

xiang90 · 2014-03-02T03:40:34Z

@benbjohnson The existing commits are dealing with proxy management, and the actually proxy functionality has not been implemented, right?
I am reviewing this pull request. Sorry for the delay.

xiang90 · 2014-03-02T05:09:03Z

@benbjohnson I roughly looked through the code. One quick question.
I see we are doing http redirection rather than a proxy.
I want to do a real proxy: the data go through local etcd proxy for the following reasons:

we can do local caching in the future.
we can easily limit the request rate per host.

benbjohnson · 2014-03-03T17:29:53Z

@xiangli-cmu Do you want the local Server/PeerServer to io.Copy() every request? Or do you actually want it to process the requests so it can extract the response values for caching? Can you give me some more implementation details? I want to make sure I'm clear on what you're looking for.

xiang90 · 2014-03-03T17:34:38Z

@benbjohnson I want the proxy to be able to understand the request and reply.

In the future we can do this for example:
local application1 watch on key /foo
local application2 watch on key /foo
There is only one outgoing http connection from our local etcd proxy to the real etcd cluster.

local application1 get key /foo
local application2 get key /foo in less than a threshold
We will just return the first result to application2.

But I think for this pull request, we can just focus on the management part and just do the redirection.

benbjohnson · 2014-03-03T18:16:07Z

@xiangli-cmu I made the requested fixes. Let me know if there are any other changes or if I can merge the PR.

philips · 2014-03-04T00:06:13Z

config/config.go

@@ -60,7 +60,6 @@ type Config struct {
 	KeyFile          string   `toml:"key_file" env:"ETCD_KEY_FILE"`
 	Peers            []string `toml:"peers" env:"ETCD_PEERS"`
 	PeersFile        string   `toml:"peers_file" env:"ETCD_PEERS_FILE"`
-	MaxClusterSize   int      `toml:"max_cluster_size" env:"ETCD_MAX_CLUSTER_SIZE"`


What is the plan here? Just flag and ignore?

Do you want it as a deprecation warning or an actual CLI error? It seems important enough that users should be notified to remove their -max-cluster-size.

philips · 2014-03-04T01:16:09Z

@benbjohnson Should /config live in the keyspace so we can do CAS on it?

xiang90 · 2014-03-04T04:21:33Z

@benbjohnson I hope we can have a design doc for our management work flow
similar to this one: https://github.com/coreos/etcd/blob/master/Documentation/design/discovery.md

xiang90 · 2014-03-04T04:25:35Z

"
Can you guys give me a code review on this? Also, I don't see where the RemoveCommand is being flushed to the removed peer (in order to cause the peer to shutdown). Am I missing something?
"

So in raft, we do not remove the peer immediately from peersMap but flush that peer with the latest commitIndex(which includes the remove command). After this flush, the peer will apply remove command, and the leader can remove it from peersMap.

https://github.com/goraft/raft/blob/master/server.go#L1073
Here we set stopHeartBeat to true to indicate that we need to do a flush then stop.

I understand stopHeartbeat with a bool argument is confusing. I am refactoring heartbeat process in raft now, and hopefully this will go away after my refactoring.

xiang90 · 2014-03-18T02:48:20Z

@philips I have sent my bug fix to ben's proxy branch.

philips · 2014-03-18T17:01:25Z

@benbjohnson @xiangli-cmu Thanks.

benbjohnson · 2014-03-18T22:40:26Z

@philips See my comments below.

Get rid of the binary muxing and use our raft side upgrade mechanism for this change. This will be a good test that we got this designed correctly. We should fix all of command's returning a byte slice in Apply to use something more reasonable like JSON.

I added two new endpoints:

PUT /v2/admin/machines/:name
DELETE /v2/admin/machines/:name

These endpoints use a new JoinCommandV2 and RemoveCommandV2. The previous versioning work that we did was on the client server side and not on the peer server side. The old /join and /remove endpoints still exist and work with JoinCommandV1 and RemoveCommandV1 so you can run old and new versions in the same cluster as long as you're using the old API. You'll need to upgrade all the nodes to use the /v2/admin/machines* API (which supports proxies).

The new commands accept and return JSON. I also took the liberty to change the EtcdURL and RaftURL parameters to ClientURL and PeerURL, respectively, since it's a new version.

Adding a version on the config interface and name spacing it all since at etcd admin tool will be messing around with it. I think something like /v2/admin/config and /v2/admin/machines. Move /join and /remove to the /v2/admin/machines interface and piggy back on the raft side upgrade used for the Command changes above.

I moved all the config and machine endpoints under /v2/admin/*:

GET /v2/admin/config
PUT /v2/admin/config
GET /v2/admin/machines
GET /v2/admin/machines/:name

Writing a design document roughly outlining the current state.

Here's the design document:

https://github.com/benbjohnson/etcd/blob/7d4fda550dbb58eaf12836b9c8fbe834782e5029/Documentation/design/proxies.md

@xiangli-cmu I added a pull request for the Raft change: goraft/raft#199

If this pull request looks good to everybody then I'll remove the Raft changes, bump the Raft version, and merge with master.

benbjohnson · 2014-03-20T13:25:30Z

@philips @xiangli-cmu Can one of you guys give me a "lgtm" or additional feedback on this final commit?

xiang90 · 2014-03-21T01:34:21Z

@philips Do you want to test and review the recent changes? Or I will go ahead to do it.

philips · 2014-03-21T05:33:09Z

@benbjohnson @xiangli-cmu I will look first thing in the morning. I couldn't quickly find how the JoinCommandV2 stashes the mode.

benbjohnson · 2014-03-21T14:40:22Z

@philips Thanks! Here's the mode changes and accesses from the last commit:

philips · 2014-03-21T15:46:28Z

@benbjohnson Besides the nits and questions lgtm.

philips · 2014-03-21T15:47:38Z

How is the upgrade path with the new JoinV2? Do we need to do something to ensure a mixed etcd 0.4/0.3 mixed cluster doesn't crash here?

benbjohnson · 2014-03-21T16:18:10Z

@philips A mixed cluster will work as long as the user is using v0.3 of the client library. That will still point at /join. Upgrading to v0.4 of the library will use /v2/admin/machines which will use JoinV2. That'll cause the old servers to panic since that command doesn't exist for them. That only happens if users are adding new boxes to the cluster though.

philips · 2014-03-22T20:34:58Z

@benbjohnson I don't understand what a 0.3 client library means in this context.

benbjohnson · 2014-03-22T21:22:26Z

@philips Sorry, the 0.3 client would be the go-etcd client as it exists now -- it joins against the /join endpoint. In the future go-etcd client (0.4), it'll join against /v2/admin/machines. If you try to use the future client to join a node then it'll instantiate a JoinCommandV2 which will work on 0.4 nodes but it will panic on 0.3 nodes since that type doesn't exist.

The command factory is currently at the store-level but implementing it at the PeerServer will require some extra work. I'm going to merge this in and do that in a separate PR.

philips · 2014-03-22T21:30:01Z

@benbjohnson I still don't understand what go-etcd has to do with machine joins.

philips · 2014-03-22T21:32:59Z

@benbjohnson I was thinking through the versioning problem some more and do you think it makes sense to leave the current /join method as-is and then have a /register or something that is used by proxies, observers, etc? Then we can use the /version endpoint to decide whether a node should take a /join rejection as a signal to /register or to just die instead.

benbjohnson · 2014-03-22T21:53:45Z

@philips I'll move the join/remove under the command factory after I merge this. Then the /v2/admin/machines will use JoinCommandV1 until the whole cluster can use JoinCommandV2.

philips · 2014-03-22T22:42:20Z

Great! Sounds perfect.
On Mar 22, 2014 2:53 PM, "Ben Johnson" notifications@github.com wrote:

@philips https://github.com/philips I'll move the join/remove under the
command factory after I merge this. Then the /v2/admin/machines will use
JoinCommandV1 until the whole cluster can use JoinCommandV2.

Reply to this email directly or view it on GitHubhttps://github.com//pull/582#issuecomment-38365362
.

Move from coreos/raft to goraft/raft and update to latest.

Conflicts: config/config.go server/peer_server.go server/transporter.go tests/server_utils.go

Proxies & Config API

benbjohnson · 2014-03-24T22:31:40Z

Everything is merge and is working as well as master. There's still a bug that's causing TestTLSMultiNodeKillAllAndRecovery to run indefinitely but that's a problem in master:

https://drone.io/github.com/coreos/etcd/99

benbjohnson added 3 commits February 22, 2014 15:02

Add proxy mode.

1d961b8

Proxy promotion.

f5698d3

Add automatic node promotion / demotion.

fddbf35

benbjohnson mentioned this pull request Feb 28, 2014

Add Peer.LastActivity() goraft/raft#194

Merged

Add auto-demotion after peer inactivity.

c8d6b26

philips reviewed Mar 4, 2014
View reviewed changes

Check for peer before demotion.

063f3b5

Machine join/remove v2 API.

7d4fda5

benbjohnson added 2 commits March 24, 2014 15:09

bump(github.com/goraft/raft): 6bf34b9

174b9ff

Move from coreos/raft to goraft/raft and update to latest.

Merge branch 'master' of https://github.com/coreos/etcd into proxy

62b89a1

Conflicts: config/config.go server/peer_server.go server/transporter.go tests/server_utils.go

benbjohnson added a commit that referenced this pull request Mar 24, 2014

Merge pull request #582 from benbjohnson/proxy

802aaf5

Proxies & Config API

benbjohnson merged commit 802aaf5 into etcd-io:master Mar 24, 2014

benbjohnson deleted the proxy branch March 24, 2014 22:30

philips mentioned this pull request Mar 26, 2014

feat(peer_server): forbid rejoining with different name #655

Closed

This was referenced Apr 3, 2014

Down node is not reported in dashboard. #647

Closed

Document Config API #679

Closed

Hook up Dashboard to new config API #681

Closed

Test/Audit Scaling #682

Closed

Migrate to new "raft API" #686

Closed

colegleason mentioned this pull request Jun 16, 2014

docs(cluster-size): remove outdated references to flag max-cluster-size #849

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxies & Config API #582

Proxies & Config API #582

benbjohnson commented Feb 19, 2014

benbjohnson commented Feb 25, 2014

benbjohnson commented Feb 25, 2014

benbjohnson commented Feb 28, 2014

benbjohnson commented Feb 28, 2014

xiang90 commented Mar 2, 2014

xiang90 commented Mar 2, 2014

benbjohnson commented Mar 3, 2014

xiang90 commented Mar 3, 2014

benbjohnson commented Mar 3, 2014

philips Mar 4, 2014

benbjohnson Mar 4, 2014

philips commented Mar 4, 2014

xiang90 commented Mar 4, 2014

xiang90 commented Mar 4, 2014

xiang90 commented Mar 18, 2014

philips commented Mar 18, 2014

benbjohnson commented Mar 18, 2014

benbjohnson commented Mar 20, 2014

xiang90 commented Mar 21, 2014

philips commented Mar 21, 2014

benbjohnson commented Mar 21, 2014

philips commented Mar 21, 2014

philips commented Mar 21, 2014

benbjohnson commented Mar 21, 2014

philips commented Mar 22, 2014

benbjohnson commented Mar 22, 2014

philips commented Mar 22, 2014

philips commented Mar 22, 2014

benbjohnson commented Mar 22, 2014

philips commented Mar 22, 2014

benbjohnson commented Mar 24, 2014

Proxies & Config API #582

Proxies & Config API #582

Conversation

benbjohnson commented Feb 19, 2014

Overview

Changes

Questions

How should we update the proxy urls?

benbjohnson commented Feb 25, 2014

benbjohnson commented Feb 25, 2014

benbjohnson commented Feb 28, 2014

benbjohnson commented Feb 28, 2014

xiang90 commented Mar 2, 2014

xiang90 commented Mar 2, 2014

benbjohnson commented Mar 3, 2014

xiang90 commented Mar 3, 2014

benbjohnson commented Mar 3, 2014

philips Mar 4, 2014

Choose a reason for hiding this comment

benbjohnson Mar 4, 2014

Choose a reason for hiding this comment

philips commented Mar 4, 2014

xiang90 commented Mar 4, 2014

xiang90 commented Mar 4, 2014

xiang90 commented Mar 18, 2014

philips commented Mar 18, 2014

benbjohnson commented Mar 18, 2014

benbjohnson commented Mar 20, 2014

xiang90 commented Mar 21, 2014

philips commented Mar 21, 2014

benbjohnson commented Mar 21, 2014

philips commented Mar 21, 2014

philips commented Mar 21, 2014

benbjohnson commented Mar 21, 2014

philips commented Mar 22, 2014

benbjohnson commented Mar 22, 2014

philips commented Mar 22, 2014

philips commented Mar 22, 2014

benbjohnson commented Mar 22, 2014

philips commented Mar 22, 2014

benbjohnson commented Mar 24, 2014