Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxies & Config API #582

Merged
merged 18 commits into from
Mar 24, 2014
Merged

Proxies & Config API #582

merged 18 commits into from
Mar 24, 2014

Conversation

benbjohnson
Copy link
Contributor

Overview

This pull request adds the ability for nodes to join as proxies when the cluster size limit is reached.

Changes

  • Peer & Proxy Management added to Registry.
  • Mode added to PeerServer. This can be either PeerMode or ProxyMode.
  • JoinCommand sets overflow nodes as proxies.
  • RemoveCommand removes proxies in addition to peers.
  • Leader responses return X-Leader-Peer-URL & X-Leader-Client-URL headers.
  • New Error (402): Proxy Internal Error

Questions

How should we update the proxy urls?

I would like to minimize the internal state that proxies have to hold and make them as dumb as possible. Currently it retrieves the proxy url via the initial join command. It can also update the proxy urls if the leader changes but the original leader is still available. However, it doesn't handle a leader failure.

My thought is to simply have a new leader continually ping the proxies after leader change until all proxies have been notified.

The other option would be to maintain a cluster configuration within the proxies but that means that we're maintaining more state.

/cc @philips @xiangli-cmu

@benbjohnson
Copy link
Contributor Author

I added the following peer endpoints:

GET /config
PUT /config

The endpoint receives and sends a JSON serialized ClusterConfig:

// Maintain a total of 9 peers, automatically promote after 30 minutes (1,800 seconds).
{"activeSize":9, "promoteDelay":1800}

I also added auto-promotion and auto-demotion when the configuration changes. Proxies are promoted when "peer count" is less than "active size". It does this by calling the POST /promote peer endpoint on the proxy. The proxy will reattempt to join the cluster and will start its Raft server.

Peers are demoted when the "peer count" is greater than "active size". It does this by issuing a DemoteCommand which converts the peer to a proxy.

The promotion/demotion is run out of a monitoring goroutine that checks that checks the active size and peer count. It's a quick check against the registry so there's little overhead. I still need to add peer activity monitoring but that'll hook in easily with the monitoring goroutine.

@philips @xiangli-cmu Can you guys give me a code review on this? Also, I don't see where the RemoveCommand is being flushed to the removed peer (in order to cause the peer to shutdown). Am I missing something?

@benbjohnson
Copy link
Contributor Author

@benbjohnson
Copy link
Contributor Author

@philips @xiangli-cmu I added a separate goroutine to monitor for activity on the peers and to demote dead peers. There's another goroutine that monitors the cluster size and will promote/demote to move the cluster size to match the ClusterConfig.ActiveSize.

Two goroutines may sound like too much but they're really two separate problems. This setup allows us to manage the cluster configuration settings separately from the dead nodes and it works pretty nicely.

The proxy work and configuration api should all be there. Can you guys review this pull request and give me feedback?

Also, on a side note, it turns out that if you double kill a process on Mac OS X then you get the blue screen of death (except it's gray on the Mac). I filed a bug against Go 1.2.

@benbjohnson
Copy link
Contributor Author

@xiang90
Copy link
Contributor

xiang90 commented Mar 2, 2014

@benbjohnson The existing commits are dealing with proxy management, and the actually proxy functionality has not been implemented, right?
I am reviewing this pull request. Sorry for the delay.

@xiang90
Copy link
Contributor

xiang90 commented Mar 2, 2014

@benbjohnson I roughly looked through the code. One quick question.
I see we are doing http redirection rather than a proxy.
I want to do a real proxy: the data go through local etcd proxy for the following reasons:

  1. we can do local caching in the future.
  2. we can easily limit the request rate per host.

@benbjohnson
Copy link
Contributor Author

@xiangli-cmu Do you want the local Server/PeerServer to io.Copy() every request? Or do you actually want it to process the requests so it can extract the response values for caching? Can you give me some more implementation details? I want to make sure I'm clear on what you're looking for.

@xiang90
Copy link
Contributor

xiang90 commented Mar 3, 2014

@benbjohnson I want the proxy to be able to understand the request and reply.

In the future we can do this for example:
local application1 watch on key /foo
local application2 watch on key /foo
There is only one outgoing http connection from our local etcd proxy to the real etcd cluster.

local application1 get key /foo
local application2 get key /foo in less than a threshold
We will just return the first result to application2.

But I think for this pull request, we can just focus on the management part and just do the redirection.

@benbjohnson
Copy link
Contributor Author

@xiangli-cmu I made the requested fixes. Let me know if there are any other changes or if I can merge the PR.

@@ -60,7 +60,6 @@ type Config struct {
KeyFile string `toml:"key_file" env:"ETCD_KEY_FILE"`
Peers []string `toml:"peers" env:"ETCD_PEERS"`
PeersFile string `toml:"peers_file" env:"ETCD_PEERS_FILE"`
MaxClusterSize int `toml:"max_cluster_size" env:"ETCD_MAX_CLUSTER_SIZE"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan here? Just flag and ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want it as a deprecation warning or an actual CLI error? It seems important enough that users should be notified to remove their -max-cluster-size.

@philips
Copy link
Contributor

philips commented Mar 4, 2014

@benbjohnson Should /config live in the keyspace so we can do CAS on it?

@xiang90
Copy link
Contributor

xiang90 commented Mar 4, 2014

@benbjohnson I hope we can have a design doc for our management work flow
similar to this one: https://github.com/coreos/etcd/blob/master/Documentation/design/discovery.md

@xiang90
Copy link
Contributor

xiang90 commented Mar 4, 2014

"
Can you guys give me a code review on this? Also, I don't see where the RemoveCommand is being flushed to the removed peer (in order to cause the peer to shutdown). Am I missing something?
"

So in raft, we do not remove the peer immediately from peersMap but flush that peer with the latest commitIndex(which includes the remove command). After this flush, the peer will apply remove command, and the leader can remove it from peersMap.

https://github.com/goraft/raft/blob/master/server.go#L1073
Here we set stopHeartBeat to true to indicate that we need to do a flush then stop.

I understand stopHeartbeat with a bool argument is confusing. I am refactoring heartbeat process in raft now, and hopefully this will go away after my refactoring.

@xiang90
Copy link
Contributor

xiang90 commented Mar 18, 2014

@philips I have sent my bug fix to ben's proxy branch.

@philips
Copy link
Contributor

philips commented Mar 18, 2014

@benbjohnson @xiangli-cmu Thanks.

@benbjohnson
Copy link
Contributor Author

@philips See my comments below.

Get rid of the binary muxing and use our raft side upgrade mechanism for this change. This will be a good test that we got this designed correctly. We should fix all of command's returning a byte slice in Apply to use something more reasonable like JSON.

I added two new endpoints:

PUT /v2/admin/machines/:name
DELETE /v2/admin/machines/:name

These endpoints use a new JoinCommandV2 and RemoveCommandV2. The previous versioning work that we did was on the client server side and not on the peer server side. The old /join and /remove endpoints still exist and work with JoinCommandV1 and RemoveCommandV1 so you can run old and new versions in the same cluster as long as you're using the old API. You'll need to upgrade all the nodes to use the /v2/admin/machines* API (which supports proxies).

The new commands accept and return JSON. I also took the liberty to change the EtcdURL and RaftURL parameters to ClientURL and PeerURL, respectively, since it's a new version.

Adding a version on the config interface and name spacing it all since at etcd admin tool will be messing around with it. I think something like /v2/admin/config and /v2/admin/machines. Move /join and /remove to the /v2/admin/machines interface and piggy back on the raft side upgrade used for the Command changes above.

I moved all the config and machine endpoints under /v2/admin/*:

GET /v2/admin/config
PUT /v2/admin/config
GET /v2/admin/machines
GET /v2/admin/machines/:name

Writing a design document roughly outlining the current state.

Here's the design document:

https://github.com/benbjohnson/etcd/blob/7d4fda550dbb58eaf12836b9c8fbe834782e5029/Documentation/design/proxies.md


@xiangli-cmu I added a pull request for the Raft change: goraft/raft#199

If this pull request looks good to everybody then I'll remove the Raft changes, bump the Raft version, and merge with master.

@benbjohnson
Copy link
Contributor Author

@philips @xiangli-cmu Can one of you guys give me a "lgtm" or additional feedback on this final commit?

@xiang90
Copy link
Contributor

xiang90 commented Mar 21, 2014

@philips Do you want to test and review the recent changes? Or I will go ahead to do it.

@philips
Copy link
Contributor

philips commented Mar 21, 2014

@benbjohnson @xiangli-cmu I will look first thing in the morning. I couldn't quickly find how the JoinCommandV2 stashes the mode.

@benbjohnson
Copy link
Contributor Author

@philips Thanks! Here's the mode changes and accesses from the last commit:

@philips
Copy link
Contributor

philips commented Mar 21, 2014

@benbjohnson Besides the nits and questions lgtm.

@philips
Copy link
Contributor

philips commented Mar 21, 2014

How is the upgrade path with the new JoinV2? Do we need to do something to ensure a mixed etcd 0.4/0.3 mixed cluster doesn't crash here?

@benbjohnson
Copy link
Contributor Author

@philips A mixed cluster will work as long as the user is using v0.3 of the client library. That will still point at /join. Upgrading to v0.4 of the library will use /v2/admin/machines which will use JoinV2. That'll cause the old servers to panic since that command doesn't exist for them. That only happens if users are adding new boxes to the cluster though.

@philips
Copy link
Contributor

philips commented Mar 22, 2014

@benbjohnson I don't understand what a 0.3 client library means in this context.

@benbjohnson
Copy link
Contributor Author

@philips Sorry, the 0.3 client would be the go-etcd client as it exists now -- it joins against the /join endpoint. In the future go-etcd client (0.4), it'll join against /v2/admin/machines. If you try to use the future client to join a node then it'll instantiate a JoinCommandV2 which will work on 0.4 nodes but it will panic on 0.3 nodes since that type doesn't exist.

The command factory is currently at the store-level but implementing it at the PeerServer will require some extra work. I'm going to merge this in and do that in a separate PR.

@philips
Copy link
Contributor

philips commented Mar 22, 2014

@benbjohnson I still don't understand what go-etcd has to do with machine joins.

@philips
Copy link
Contributor

philips commented Mar 22, 2014

@benbjohnson I was thinking through the versioning problem some more and do you think it makes sense to leave the current /join method as-is and then have a /register or something that is used by proxies, observers, etc? Then we can use the /version endpoint to decide whether a node should take a /join rejection as a signal to /register or to just die instead.

@benbjohnson
Copy link
Contributor Author

@philips I'll move the join/remove under the command factory after I merge this. Then the /v2/admin/machines will use JoinCommandV1 until the whole cluster can use JoinCommandV2.

@philips
Copy link
Contributor

philips commented Mar 22, 2014

Great! Sounds perfect.
On Mar 22, 2014 2:53 PM, "Ben Johnson" notifications@github.com wrote:

@philips https://github.com/philips I'll move the join/remove under the
command factory after I merge this. Then the /v2/admin/machines will use
JoinCommandV1 until the whole cluster can use JoinCommandV2.

Reply to this email directly or view it on GitHubhttps://github.com//pull/582#issuecomment-38365362
.

Move from coreos/raft to goraft/raft and update to latest.
Conflicts:
	config/config.go
	server/peer_server.go
	server/transporter.go
	tests/server_utils.go
benbjohnson added a commit that referenced this pull request Mar 24, 2014
@benbjohnson benbjohnson merged commit 802aaf5 into etcd-io:master Mar 24, 2014
@benbjohnson benbjohnson deleted the proxy branch March 24, 2014 22:30
@benbjohnson
Copy link
Contributor Author

Everything is merge and is working as well as master. There's still a bug that's causing TestTLSMultiNodeKillAllAndRecovery to run indefinitely but that's a problem in master:

https://drone.io/github.com/coreos/etcd/99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants