Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled Exception in Connection Loop: RuntimeError: ('xids do not match, expected %r received %r', 28, 27) #254

Closed
diranged opened this issue Oct 23, 2014 · 6 comments
Labels

Comments

@diranged
Copy link
Contributor

Two days ago we upgraded our servers from Kazoo 1.3.1 -> Kazoo 2.0. We started seeing a 10x increase in the number of connection failures ... and the recovery of those connections is far worse too. It looks like we're seeing a new exception being raised that we did not see before.

        "stack_trace": [
            "Traceback (most recent call last):", 
            "  File '/mnt/.venv/lib/python2.7/site-packages/kazoo/protocol/connection.py', line 531, in _connect_attempt", 
            "    response = self._read_socket(read_timeout)", 
            "  File '/mnt/.venv/lib/python2.7/site-packages/kazoo/protocol/connection.py', line 407, in _read_socket", 
            "    return self._read_response(header, buffer, offset)", 
            "  File '/mnt/.venv/lib/python2.7/site-packages/kazoo/protocol/connection.py', line 338, in _read_response", 
            "    'received %r', xid, header.xid)", 
            "RuntimeError: ('xids do not match, expected %r received %r', 28, 27)"

This is being caught by kazoo.client and throwing the Unhandled exception in connection loop error. I'm completely stumped, I can't seem to replicate this in my own dev environment... and it only happens sporadically in production.

@diranged
Copy link
Contributor Author

I can reproduce this failure in our staging and production environments where we have zookeeper clusters and the client is disconnected from NodeA and reconnects to NodeB. It happens every time it seems. Still working to reproduce this in a smaller dev environment.

@diranged
Copy link
Contributor Author

@bbangert I noticed you did quite a bit of code reworking for Kazoo 2 regarding connection handling. Can you comment on this?

@diranged
Copy link
Contributor Author

diranged commented Nov 6, 2014

We are still seeing this happen occasionally and are downgrading all of our servers to Kazoo 1.3.1 until this is resolved.

@bbangert
Copy link
Member

bbangert commented Nov 6, 2014

@diranged I don't know if the latest 2 dev has a fix for this offhand. If you can reproduce on staging, maybe you can test it there?

@rgs1
Copy link
Contributor

rgs1 commented Nov 24, 2014

@diranged curious, are you using authentication? (i.e.: add_auth calls). The reason I ask is because of 15b7632. Not sure if related.

Also, what's running on your servers? I do see this is from time to time on 3.5 (i.e.: ZK out of trunk from a couple of months ago + patches).

@jeffwidman
Copy link
Member

bump @diranged any updates per above?

Is it still reproducible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants