Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge linkedin/go-zk changes upstream #122

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

PapaCharlie
Copy link

It may be more convenient to look at these changes sequentially:

* Fix the runtime of TestChildWatch and TestSetWatchers

TestChildWatch was introduced to test what happens when the response is too large for the default
allocated buffer size. It used to create 10k nodes which could take a very significant amount of
time. The same effect can be achieved with far fewer nodes, significantly speeding up the test's
runtime. The test now runs in 5s on my machine instead of sometimes multiple minutes...

TestSetWatchers tested something similar, except that it checks that the outgoing setWatchers packet
is broken up into multiple packets when it's too large. Using a similar trick we can generate
names of specific lengths to test that the behavior is correct. It was also flaky because if your
local ZK deployment is a little slow, deleting all the nodes can take longer than the session
timeout, spuriously failing the test. This has also been fixed, and the test now runs in a little
over 5 seconds as well, instead of failing.

Finally, standardize the ZK server version checking to be a bit more flexible and friendlier towards
future versions of ZooKeeper (note: the original implementation doesn't even work because the env
variable name is incorrect... It `ZK_VERSION`, not `zk_version`)

* Support persistent watches

Implements the new persistent watch types introduced in 3.6, along with some utilities that are
critical when implementing local caches.

* Add metrics receiver and fix persistent watch EventQueue

* Add error to RequestCompleted
…tchers (#2)

Like the comment on Event.Zxid mentions, watch events now include the zxid as of
ZK 3.9.0. Additionally, relaying pings to persistent watchers allows them to
keep track of whether they are behind on consuming updates from ZK, particularly
since the zxid is relayed.
This field can be used to track the end-to-end time in between receiving the
event and actually processing it. Since events end up in a queue, it's hard to
tell how far behind the consumer on the EventQueue is, and this timestamp can be
used to track that.
This data structure has many other applications, especially when dealing with
ZK.
* Upgrade HostProvider

This change fixes the behavior of DNSHostProvider where it does not refresh its cached IP addresses that it resolves once on startup for the configured ZK servers. This new behavior more closely matches the Java client's behavior by randomly selecting an address after resolving the host. It slightly changes the semantics of `HostProvider` with an off-by-one, otherwise the `connect` loop could end up in a situation where it attempts to connect to a stale address. This is fixed by moving the backoff to _before_ getting the address, rather than _after_.

* Bump linter version to support generics

* Fix linter and integration test actions

* Add docs
* Improve logging and error handling

This overhauls the logging and error responses from the *Conn. Namely, it
changes the logging backend to `log/slog` which makes it easier to manage the
library's logging centrally (however the old functionally to provide a custom
logger is retained). It also captures the reason why a specific connection to a
ZK server was closed in the error that is propagated to all callers. This can be
checked with `errors.Is`.

This also adds an exponential backoff feature to the client, which will make it
back off when reconnecting to a ZK server.

* Remove logger option, it messes with the logging

* Change default behavior of DNSHostProvider

* Split implementation of HostProvider, fix up tests
This test was broken by the new error format returned by the connection
When adding a port to an IPv6 address, it must be wrapped with [...].
Copy link

codecov bot commented Mar 6, 2024

Codecov Report

Attention: Patch coverage is 69.06475% with 129 lines in your changes are missing coverage. Please review.

Project coverage is 75.00%. Comparing base (c5e730e) to head (d59ad1b).

Files Patch % Lines
conn.go 64.94% 78 Missing and 17 partials ⚠️
staticdnshostprovider.go 70.00% 8 Missing and 4 partials ⚠️
structs.go 75.00% 8 Missing and 2 partials ⚠️
unlimited_channel.go 82.60% 4 Missing and 4 partials ⚠️
constants.go 0.00% 3 Missing ⚠️
dnshostprovider.go 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #122      +/-   ##
==========================================
- Coverage   75.94%   75.00%   -0.95%     
==========================================
  Files           7       10       +3     
  Lines        1193     1452     +259     
==========================================
+ Hits          906     1089     +183     
- Misses        196      252      +56     
- Partials       91      111      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -69,7 +103,7 @@ func StartTestCluster(t *testing.T, size int, stdout, stderr io.Writer) (*TestCl

for serverN := 0; serverN < size; serverN++ {
srvPath := filepath.Join(tmpPath, fmt.Sprintf("srv%d", serverN+1))
requireNoError(t, os.Mkdir(srvPath, 0700), "failed to make server path")
requireNoErrorf(t, os.Mkdir(srvPath, 0700), "failed to make server path")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pluck these changes into a separate diff since they do not change any substance to the code. Just trying to remove lines to review as change vs not

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack makes sense! As i do that, any opposition to using testify to clean this up further ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have maintained using only the stdlib as it does have nice properties of not conflicting or dealing with any dependencies. I'm not against it longer term, but for this round of changes lets stick to stdlib.

I want to keep the scope of change tighter as we get to the meat of the changes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chipping away and doing small PRs like errors.Is will also not change behavior and good to clean up this larger PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK makes sense, it's not necessary, just a nicety. I can move some of the errors.Is stuff but I did change the error signatures of some of these methods, and it does actually break if you don't do errors.Is anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants