-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxies & Config API #582
Proxies & Config API #582
Conversation
I added the following peer endpoints:
The endpoint receives and sends a JSON serialized
I also added auto-promotion and auto-demotion when the configuration changes. Proxies are promoted when "peer count" is less than "active size". It does this by calling the Peers are demoted when the "peer count" is greater than "active size". It does this by issuing a The promotion/demotion is run out of a monitoring goroutine that checks that checks the active size and peer count. It's a quick check against the registry so there's little overhead. I still need to add peer activity monitoring but that'll hook in easily with the monitoring goroutine. @philips @xiangli-cmu Can you guys give me a code review on this? Also, I don't see where the |
@philips @xiangli-cmu I added a separate goroutine to monitor for activity on the peers and to demote dead peers. There's another goroutine that monitors the cluster size and will promote/demote to move the cluster size to match the Two goroutines may sound like too much but they're really two separate problems. This setup allows us to manage the cluster configuration settings separately from the dead nodes and it works pretty nicely. The proxy work and configuration api should all be there. Can you guys review this pull request and give me feedback? Also, on a side note, it turns out that if you double kill a process on Mac OS X then you get the blue screen of death (except it's gray on the Mac). I filed a bug against Go 1.2. |
@benbjohnson The existing commits are dealing with proxy management, and the actually proxy functionality has not been implemented, right? |
@benbjohnson I roughly looked through the code. One quick question.
|
@xiangli-cmu Do you want the local |
@benbjohnson I want the proxy to be able to understand the request and reply. In the future we can do this for example: local application1 get key /foo But I think for this pull request, we can just focus on the management part and just do the redirection. |
@xiangli-cmu I made the requested fixes. Let me know if there are any other changes or if I can merge the PR. |
@@ -60,7 +60,6 @@ type Config struct { | |||
KeyFile string `toml:"key_file" env:"ETCD_KEY_FILE"` | |||
Peers []string `toml:"peers" env:"ETCD_PEERS"` | |||
PeersFile string `toml:"peers_file" env:"ETCD_PEERS_FILE"` | |||
MaxClusterSize int `toml:"max_cluster_size" env:"ETCD_MAX_CLUSTER_SIZE"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan here? Just flag and ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want it as a deprecation warning or an actual CLI error? It seems important enough that users should be notified to remove their -max-cluster-size
.
@benbjohnson Should /config live in the keyspace so we can do CAS on it? |
@benbjohnson I hope we can have a design doc for our management work flow |
" So in raft, we do not remove the peer immediately from peersMap but flush that peer with the latest commitIndex(which includes the remove command). After this flush, the peer will apply remove command, and the leader can remove it from peersMap. https://github.com/goraft/raft/blob/master/server.go#L1073 I understand stopHeartbeat with a bool argument is confusing. I am refactoring heartbeat process in raft now, and hopefully this will go away after my refactoring. |
@philips I have sent my bug fix to ben's proxy branch. |
@benbjohnson @xiangli-cmu Thanks. |
@philips See my comments below.
I added two new endpoints:
These endpoints use a new The new commands accept and return JSON. I also took the liberty to change the
I moved all the config and machine endpoints under
Here's the design document: @xiangli-cmu I added a pull request for the Raft change: goraft/raft#199 If this pull request looks good to everybody then I'll remove the Raft changes, bump the Raft version, and merge with master. |
@philips @xiangli-cmu Can one of you guys give me a "lgtm" or additional feedback on this final commit? |
@philips Do you want to test and review the recent changes? Or I will go ahead to do it. |
@benbjohnson @xiangli-cmu I will look first thing in the morning. I couldn't quickly find how the JoinCommandV2 stashes the mode. |
@philips Thanks! Here's the mode changes and accesses from the last commit: |
@benbjohnson Besides the nits and questions lgtm. |
How is the upgrade path with the new JoinV2? Do we need to do something to ensure a mixed etcd 0.4/0.3 mixed cluster doesn't crash here? |
@philips A mixed cluster will work as long as the user is using v0.3 of the client library. That will still point at |
@benbjohnson I don't understand what a 0.3 client library means in this context. |
@philips Sorry, the 0.3 client would be the go-etcd client as it exists now -- it joins against the The command factory is currently at the store-level but implementing it at the PeerServer will require some extra work. I'm going to merge this in and do that in a separate PR. |
@benbjohnson I still don't understand what go-etcd has to do with machine joins. |
@benbjohnson I was thinking through the versioning problem some more and do you think it makes sense to leave the current /join method as-is and then have a /register or something that is used by proxies, observers, etc? Then we can use the /version endpoint to decide whether a node should take a /join rejection as a signal to /register or to just die instead. |
@philips I'll move the join/remove under the command factory after I merge this. Then the |
Great! Sounds perfect.
|
Move from coreos/raft to goraft/raft and update to latest.
Conflicts: config/config.go server/peer_server.go server/transporter.go tests/server_utils.go
Everything is merge and is working as well as |
Overview
This pull request adds the ability for nodes to join as proxies when the cluster size limit is reached.
Changes
Questions
How should we update the proxy urls?
I would like to minimize the internal state that proxies have to hold and make them as dumb as possible. Currently it retrieves the proxy url via the initial join command. It can also update the proxy urls if the leader changes but the original leader is still available. However, it doesn't handle a leader failure.
My thought is to simply have a new leader continually ping the proxies after leader change until all proxies have been notified.
The other option would be to maintain a cluster configuration within the proxies but that means that we're maintaining more state.
/cc @philips @xiangli-cmu