Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kazoo heartbeats dont work with eventlet or monkey-patched threads #364

Open
rjaiwal5139 opened this issue Nov 13, 2015 · 8 comments
Open

Comments

@rjaiwal5139
Copy link

More details here: https://bugs.launchpad.net/python-tooz/+bug/1512001

@bbangert
Copy link
Member

The bug cited includes this tidbit from the server:

 2015-11-03 18:37:37,380 - WARN [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 3633ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide

Is the heart-beat not working, or is the server experiencing issues which effect latency and therefore not responding to the ping quick enough?

Can someone verify that kazoo does not send heartbeat pings with eventlet independent of the rest of the bug cited here where the ZK server appears to be slammed or malfunctioning?

@bbangert
Copy link
Member

BTW, the error afterwards (my above paste) should be especially disconcerting to the operation of the Zookeeper cluster:

 2015-11-03 18:37:37,392 - ERROR [CommitProcessor:0:NIOServerCnxn@180] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)

Zookeeper is running into errors while attempting to commit its sync log. Since Zookeeper is spinning its own event loop (NIO), if its blocked waiting for quorum and trying to write the sync-log, clients will not receive pings.

@rjaiwal5139
Copy link
Author

after patching the kazoo connection with #363 , i do see regular pings over time, but i get a lot of expired session messages like these:

(kazoo.client): 2015-11-13 14:33:24,379 WARNING connection _connect_attempt Session has expired
(kazoo.client): 2015-11-13 14:33:24,379 INFO client _session_callback Zookeeper session lost, state: EXPIRED_SESSION
(kazoo.client): 2015-11-13 14:57:52,985 INFO connection _connect Connecting to padawan-ccp-c1-m1-mgmt:2181
(kazoo.client): 2015-11-13 14:57:52,986 DEBUG connection _submit Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=10000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
(kazoo.client): 2015-11-13 14:57:52,989 INFO client _session_callback Zookeeper connection established, state: CONNECTED
(kazoo.client): 2015-11-13 14:57:52,999 DEBUG connection _submit Sending request(xid=1): GetChildren(path=u'/tooz/ceilometer.notification', watcher=<bound method ChildrenWatch._watcher of <kazoo.recipe.watchers.ChildrenWatch object at 0x7fd3e7091cd0>>)
(kazoo.client): 2015-11-13 14:57:53,011 DEBUG connection _read_response Received response(xid=1): []

Querying Zookeeper using shell returns empty:

[zk: localhost:2181(CONNECTED) 9] ls /tooz/ceilometer.notification
[]

@bbangert
Copy link
Member

When you say you see regular pings, does that mean you see kazoo sending them more frequently, or do you see responses?

If the underlying problem is that Zookeeper still cannot sync properly then sending more pings will not keep the session active since Zookeeper won't process them within the session lifetime.

@rjaiwal5139
Copy link
Author

I think kazoo is sending them as defined (in regular intervals) but the response i empty. When the agents are restarted, the response is there, i get 3 uuids for the 3 agents, the same is returned by the zookeeper shell, but soon after that, the agent logs show session expiry and empty response is returned on all 3 agents and also the zookeeper shell for ls /tooz/ceilometer.notification as shown above. Things stop working when session expires..

@harlowja
Copy link
Contributor

So we chatted on IRC, rjaiwal5139 is going to do some testing by putting zookeeper on its own hardware (separated from other VMs) and report back if this issues still occurs when this change is done...

Useful link for folks:

https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#Single+Machine+Requirements

@rjaiwal5139
Copy link
Author

Slight correction in my earlier response - i was testing with 3 VM instances all sharing the same baremetal with ZK running on all 3, not just one, however i was passing in just one hostname to Kazoo,

Initial observation on a multi-node test install shows Zookeeper without any of the sync errors on my local setup. Does Kazoo handle failover among ZK hosts when more than one host is specified?

@bbangert
Copy link
Member

In the event kazoo pings out on a server, it will move to the next server in the list, yes. I believe it'll also separate out hosts if multiple ones are found for a single DNS name to rotate amongst as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants