Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Support SetWatches and watch restoration #305

Merged
merged 1 commit into from
May 31, 2017

Conversation

rgs1
Copy link
Contributor

@rgs1 rgs1 commented Apr 3, 2015

Basic plumbing to support resetting watches upon recovering a session (as done by the Java implementation).

Given this breaks the current behaviour, it should be made optional. I'll circle back with that added in. Comments welcomed!

Fixes #218

Signed-off-by: Raul Gutierrez S rgs@itevenworks.net

Fixes python-zk#218

Signed-off-by: Raul Gutierrez S <rgs@itevenworks.net>
@bbangert bbangert merged commit 6081762 into python-zk:master May 31, 2017
@tonyseek
Copy link
Contributor

tonyseek commented Jun 12, 2017

It seems the connection will be broken if we are using data watcher and child watcher in the same client. The following test could reproduce this:

diff --git a/kazoo/tests/test_client.py b/kazoo/tests/test_client.py
index af8ecf5..e082232 100644
--- a/kazoo/tests/test_client.py
+++ b/kazoo/tests/test_client.py
@@ -965,16 +965,20 @@ class TestClient(KazooTestCase):
 
     def test_set_watches_on_reconnect(self):
         client = self.client
-        watch_event = client.handler.event_object()
+        data_watch_event = client.handler.event_object()
+        child_watch_event = client.handler.event_object()
 
         client.create("/tacos")
 
         # set the watch
-        def w(we):
-            eq_(we.path, "/tacos")
-            watch_event.set()
+        def make_watch(watch_event):
+            def w(we):
+                eq_(we.path, "/tacos")
+                watch_event.set()
+            return w
 
-        client.get_children("/tacos", watch=w)
+        client.get("/tacos", watch=make_watch(data_watch_event))
+        client.get_children("/tacos", watch=make_watch(child_watch_event))
 
         # force a reconnect
         states = []
@@ -986,19 +990,26 @@ class TestClient(KazooTestCase):
                 states.append(state)
                 rc.set()
 
-        client._connection._socket.shutdown(socket.SHUT_RDWR)
+        # client._connection._socket.shutdown(socket.SHUT_RDWR)
+        self.lose_connection(threading.Event)
 
         rc.wait(10)
         eq_(states, [KazooState.CONNECTED])
 
         # watches should still be there
-        self.assertTrue(len(client._child_watchers) == 1)
+        self.assertEqual(len(client._data_watchers), 1)
+        self.assertEqual(len(client._child_watchers), 1)
 
         # ... and they should fire
         client.create("/tacos/hello_", b"", ephemeral=True, sequence=True)
 
-        watch_event.wait(1)
-        self.assertTrue(watch_event.is_set())
+        child_watch_event.wait(1)
+        self.assertTrue(child_watch_event.is_set())
+
+        client.set("/tacos", b"")
+
+        data_watch_event.wait(1)
+        self.assertTrue(data_watch_event.is_set())
 
 
 dummy_dict = {

The traceback is:

Traceback (most recent call last):
  File "/Users/tonyseek/Sites/kazoo/kazoo/protocol/connection.py", line 547, in _connect_attempt
    read_timeout, connect_timeout = self._connect(host, port)
  File "/Users/tonyseek/Sites/kazoo/kazoo/protocol/connection.py", line 675, in _connect
    xid=SET_WATCHES_XID)
  File "/Users/tonyseek/Sites/kazoo/kazoo/protocol/connection.py", line 245, in _invoke
    raise RuntimeError('xids do not match, expected %r '
RuntimeError: ('xids do not match, expected %r received %r', -8, -1)

This may be the root cause which breaks the test of #398.

I am trying to solve this.

@bbangert
Copy link
Member

@tonyseek if you revert this PR, does the issue go away? I think I've seen a reported case where it had a XIDS mismatch also looking for something like this.

@tonyseek
Copy link
Contributor

@bbangert I believe this is a race condition bug which introduced by the SetWatch and _invoke implementation. The received response header is pointed to previous watch events instead of the SetWatch request. I am still tracking on this issue, and it will go away if we revert this PR.

@bbangert
Copy link
Member

Ok, I'm inclined to revert this then, as this wasn't your PR I don't expect you to fix it. Thank you for diving into this to track down how it's blocking the other PR.

bbangert added a commit that referenced this pull request Jun 12, 2017
PR #305 introduced a feature to restore watches on reconnect.
Unfortunately this introduced RuntimeError's under various cases, so
reverting it is necessary.
bbangert added a commit that referenced this pull request Jun 12, 2017
fix(core): revert PR #305 SetWatches which caused RuntimeError
@bbangert
Copy link
Member

@tonyseek I've reverted this PR and merged to master. The TreeCache PR might 'just work' now.

@wenbindu
Copy link

I do not see the change in the connection.py, when can we use the library for the change.

@bbangert
Copy link
Member

@BrandonDu the revert for this fix is in the latest master, it hasn't been released as a new revision yet.

@wenbindu
Copy link

@bbangert , I just test the case, what is worse is that the watch can not work.
I think most of the code u commit works well ,except the raise RuntimeError.

@tonyseek
Copy link
Contributor

tonyseek commented Jun 13, 2017

@BrandonDu The recipes of Kazoo like DataWatch and ChildrenWatch work well without the SetWatch support. Because they have implemented the session watcher correctly. I suggest using a tested recipe like them instead of processing raw watch events by yourself.

The RuntimeError is critical and worse than lack of SetWatch. It is caused by packets which are out of order and it breaks connection during reconnecting.

@rajath26
Copy link

rajath26 commented Jun 3, 2020

Was setWatcher and watch restoration ever brought back in the later versions of kazoo?

@rgs1
Copy link
Contributor Author

rgs1 commented Jan 20, 2023

Ok, sorry for joining the party a few years late but the fix is really pretty easy:

given it's fine to receive either an xid -8 (SetWatches) or xid -1 (WatchEvent) in the response when setting SetWatches (since a watch can fire immediately after being set), we just need to change _invoke() to accept a list of valid xids instead of just one.

Happy to send a PR to unrevert and handle this scenario.

@StephenSorriaux
Copy link
Member

Hello,

Thank you for looking into this. We would be very happy to review a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for SetWatches command, as well as watch restoration on ZK session reconnect
6 participants