Add support for multiple endpoints and failover by InvalidInterrupt · Pull Request #1596 · kragniz/python-etcd3

InvalidInterrupt · 2021-05-22T22:28:40Z

Built upon the work in #106.

This still needs docs. Ideally, I'll also be able to add tests of failover functionality with multiple running etcd services.

Since etcd is a distributed storage, it makes sense for a client library to be able to connect to all the nodes in the cluster, and to manage their current status without needing to be reinstantiated. This patch adds the ability for the client to be aware of the status of all nodes in the cluster, and - in case of failure - to mark each of them as temporarily failed. Individual requests will continue to raise an exception in case of a failed connection at this point, but it would be easy to allow retrying if it seems a good idea. What has been done in practice: * Added a 'Endpoint' class, a simple FSM to abstract a remote server. * Refactored the initialization of the library, adding a switch to allow failing over to another server in case of need. * Refactored self.channel to be a function returning a grpc channel to the first non-failed node, and all the Etcd3Client.*stub properties accordingly. * Refactored the _handle_error wrapper, made it part of the class, and split it between the version for generators and the one for normal returners for code simplicity

…tcd3

This was causing test failures

… instances

Based on https://grpc.github.io/grpc/core/md_doc_connectivity-semantics-and-api.html I believe we should never encounter a channel rendered unusable by a previous error

…sons other than GRPC errors

Encrypt · 2021-06-02T17:30:13Z

Hello @InvalidInterrupt!

I'm really looking forward to seeing this pull request merged! Especially since it's been a subject for a long time it seems.

Do you need help on something? I could probably participate.

InvalidInterrupt · 2021-06-02T20:18:07Z

Hi @Encrypt, and thanks for offering your assistance!

I think I'm almost done with the changes to the library itself; the last thing on my to-do list is make the client constructor raise an error if it's passed hostname and port along with an explicit list of endpoints. Otherwise, I just need to write some docs and try to add more tests.

If you'd like to help, I think this is close enough to done that an initial code review would be helpful. Since I'm still working on tests, do point out any code you feel absolutely must be covered by a test.

Encrypt · 2021-06-03T20:03:55Z

Hello @InvalidInterrupt!

Thanks for your answer. I'll probably do a code review tomorrow if I have time!

…endpoints Perhaps it would be better to move most Etcd3Client behavior into a new, public base named MultiEndpointEtcd3Client or similar, and make Etcd3Client keep the old initializer interface by converting the arguments into an Endpoint before calling super().__init__().

* Add docstrings for Etcd3Client and Endpoint * Add Endpoint autodoc to usage.rst

…ng wth endpoints

InvalidInterrupt · 2021-06-04T20:22:57Z

Commenting with an idea expressed in a commit message to make sure it is seen:

Perhaps it would be better to move most Etcd3Client behavior into a new, public base named MultiEndpointEtcd3Client or similar, and make Etcd3Client keep the old initializer interface by converting the arguments into an Endpoint before calling super().__init__().

InvalidInterrupt · 2021-06-19T23:52:27Z

@kragniz Just realized I forgot to tag you

kragniz

@InvalidInterrupt this is looking good, but I think your idea to create a MultiEndpointEtcd3Client would be useful. The constructor is already quite complicated, so I'd rather have one for each case.

kragniz · 2021-06-20T20:01:26Z

etcd3/client.py

+    :type opts: dict, optional
+    """
+
+    def __init__(self, host, port, secure=True, creds=None, time_retry=300.0,


300 seconds seems pretty long as a default, is there a particular reason for this length?

I just kept what the previous author had used here. Since there's currently no built-in provision for jitter or exponential backoff, setting this to be a bit long may help mitigate any "thundering herds" a little. In the event that the user has set up multi-endpoint failover and there are still healthy nodes available, their application won't remain unavailable for those five minutes.

I'm not arguing for any particular value (I also think it could be a bit shorter), but figured I should present an argument for keeping it long.

kragniz · 2021-06-20T20:08:57Z

etcd3/client.py

        for response in snapshot_response:
            file_obj.write(response.blob)

+    # Remove utility functions from class namespace


Let's not delete these - the _ is enough to stop people from depending on them

My motivation for this change was that becoming a bound method doesn't work well with these functions' signatures (i.e. self will be passed in as payload). Perhaps marking them as staticmethod down here would be a better solution to that problem?

Regardless, I'll remove this logic for now.

… class" This reverts commit 58979ae.

* Turn Etcd3Client in MultiEndpointEtcd3Client, and move existing single-endpoint interface into a subclass retaining the old name. * Revert changes to client() * Autodoc MultiEndpointEtcd3Client

kragniz

This seems good, thanks for your work in getting this finished off

lavagetto and others added 19 commits April 7, 2018 17:08

Merge branch 'master' into reconnections

5ca831b

Merge branch 'reconnections' of https://github.com/lavagetto/python-e…

4013d63

…tcd3

Call close on all endpoint's channels at client close

5e48292

This was causing test failures

Fix flake8 issue

b8085a0

Remove unneeded branch

afd0fd7

Make Etcd3Client's endpoints parameter expect an iterable of Endpoint…

ff0f0e7

… instances

Pass through the endpoint parameter in client()

5bf1624

Prevent utility functions from becoming methods of the client class

58979ae

Cache the client stub properties

98dfa8b

Remove TODO comment regarding status of old GRPC channels

73a68f7

Based on https://grpc.github.io/grpc/core/md_doc_connectivity-semantics-and-api.html I believe we should never encounter a channel rendered unusable by a previous error

Make watcher thread end on failure or stream closure

eb2ecd8

Close existing watcher on failover

3be8fbc

Save Endpoint time_retry parameter

a964d09

Close self.watcher when switching endpoints or closing client for rea…

f9d01cd

…sons other than GRPC errors

Export Endpoint at package level

5884f0f

Make get_secure_creds a public static method

eb455ae

Fix failover logic

2a707c7

Add co-author and self to authors file

9716466

InvalidInterrupt added 5 commits June 3, 2021 16:09

Add docs for failover functionality

e3ca56b

* Add docstrings for Etcd3Client and Endpoint * Add Endpoint autodoc to usage.rst

Fix TestClient._disable_auth_in_etcd

b102805

fixup! Prevent users from specifying options that will be ignored alo…

773ece1

…ng wth endpoints

Add initial tests of failover behaviour

a1073f3

InvalidInterrupt marked this pull request as ready for review June 4, 2021 20:18

kragniz reviewed Jun 20, 2021

View reviewed changes

InvalidInterrupt added 2 commits June 20, 2021 14:53

Revert "Prevent utility functions from becoming methods of the client…

18bb097

… class" This reverts commit 58979ae.

Split client class

7acec26

* Turn Etcd3Client in MultiEndpointEtcd3Client, and move existing single-endpoint interface into a subclass retaining the old name. * Revert changes to client() * Autodoc MultiEndpointEtcd3Client

InvalidInterrupt requested a review from kragniz July 3, 2021 19:07

kragniz approved these changes Jul 6, 2021

View reviewed changes

kragniz merged commit e78000e into kragniz:master Jul 6, 2021

CyberDem0n mentioned this pull request May 22, 2022

Certificate authentication doesn't work with etcd v3 patroni/patroni#2036

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for multiple endpoints and failover#1596

Add support for multiple endpoints and failover#1596
kragniz merged 26 commits intokragniz:masterfrom
InvalidInterrupt:reconnections

InvalidInterrupt commented May 22, 2021

Uh oh!

Encrypt commented Jun 2, 2021

Uh oh!

InvalidInterrupt commented Jun 2, 2021

Uh oh!

Encrypt commented Jun 3, 2021

Uh oh!

InvalidInterrupt commented Jun 4, 2021

Uh oh!

InvalidInterrupt commented Jun 19, 2021

Uh oh!

kragniz left a comment

Uh oh!

kragniz Jun 20, 2021

Uh oh!

InvalidInterrupt Jun 20, 2021 •

edited

Loading

Uh oh!

kragniz Jun 20, 2021

Uh oh!

InvalidInterrupt Jun 20, 2021 •

edited

Loading

Uh oh!

kragniz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

InvalidInterrupt commented May 22, 2021

Uh oh!

Encrypt commented Jun 2, 2021

Uh oh!

InvalidInterrupt commented Jun 2, 2021

Uh oh!

Encrypt commented Jun 3, 2021

Uh oh!

InvalidInterrupt commented Jun 4, 2021

Uh oh!

InvalidInterrupt commented Jun 19, 2021

Uh oh!

kragniz left a comment

Choose a reason for hiding this comment

Uh oh!

kragniz Jun 20, 2021

Choose a reason for hiding this comment

Uh oh!

InvalidInterrupt Jun 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kragniz Jun 20, 2021

Choose a reason for hiding this comment

Uh oh!

InvalidInterrupt Jun 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kragniz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

InvalidInterrupt Jun 20, 2021 •

edited

Loading

InvalidInterrupt Jun 20, 2021 •

edited

Loading