Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BrokenPipeErrors while using datastore (dev) #1811

Closed
jennolsen84 opened this issue May 20, 2016 · 15 comments
Closed

BrokenPipeErrors while using datastore (dev) #1811

jennolsen84 opened this issue May 20, 2016 · 15 comments
Assignees
Labels
api: datastore Issues related to the Datastore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jennolsen84
Copy link

We've recently started to see more BrokenPipeErrors while using gcloud-python. We started noticing them more when we upgraded to Kubernetes 1.2.4 (vs 1.1), if that makes any difference. But, I think this was just a coincidence.

Anyway, I delved deeper and saw that the dev datastore version was sending FIN ACK, and then RST ACK packets after some inactivity. The datastore client machine sees these packets as well. The next time the client tries to do a query against datastore, we get a broken pipe error. It would be nice if the gcloud-python / httplib2 did some pre-emptive checking to see if the connection was closed retried the query itself.

Some code snippets:

This is one way to detect if the socket on the other side has closed a connection.

  possibly_disconnected = False
    data_waiting = bytes() 
    try: 
        data_waiting = conn.recv(BUFFER_SIZE, socket.MSG_DONTWAIT|socket.MSG_PEEK) 
    except BlockingIOError: 
        # There is no data available... and the socket still seems good 
        pass 
    else: 
        possibly_disconnected = len(data_waiting) == 0

Here is a stack trace of the exception

  File "/root/pyenvs/my_env/lib/python3.5/site-packages/gcloud/datastore/client.py", line 250, in get
    deferred=deferred)
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/gcloud/datastore/client.py", line 291, in get_multi
    transaction_id=transaction and transaction.id,
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/gcloud/datastore/client.py", line 123, in _extended_lookup
    transaction_id=transaction_id,
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/gcloud/datastore/connection.py", line 203, in lookup
    _datastore_pb2.LookupResponse)
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/gcloud/datastore/connection.py", line 121, in _rpc
    data=request_pb.SerializeToString())
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/gcloud/datastore/connection.py", line 93, in _request
    method='POST', headers=headers, body=data)
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/oauth2client/client.py", line 621, in new_request
    redirections, connection_type)
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/httplib2/__init__.py", line 1314, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/httplib2/__init__.py", line 1064, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/root/pyenvs/my_env/lib/python3.5/site-packages/httplib2/__init__.py", line 988, in _conn_request
    conn.request(method, request_uri, body, headers)
  File "/usr/local/lib/python3.5/http/client.py", line 1083, in request
    self._send_request(method, url, body, headers)
  File "/usr/local/lib/python3.5/http/client.py", line 1128, in _send_request
    self.endheaders(body)
  File "/usr/local/lib/python3.5/http/client.py", line 1079, in endheaders
    self._send_output(message_body)
  File "/usr/local/lib/python3.5/http/client.py", line 911, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.5/http/client.py", line 885, in send
    self.sock.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe
@daspecster
Copy link
Contributor

Thanks for reporting @jennolsen84!

@dhermes, is this related to the issues we were talking about in #1796?

@daspecster daspecster added api: datastore Issues related to the Datastore API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 20, 2016
@dhermes
Copy link
Contributor

dhermes commented May 20, 2016

@jennolsen84 This seems to be #1214. Any chance your application is threaded? @jonparrott and I have been working to move off of httplib2 and onto requests to gain thread-safety. You can check out #1214 for some short-term ways to use a different transport library.

@jennolsen84
Copy link
Author

@dhermes I think we're seeing this in single threaded apps as well as multithreaded ones. Probably not too hard to reproduce... I assume you already have a dev datastore setup, so you just need to connect to it (ideally on a different machine so you can tcpdump), do a query, and then wait 5 minutes, and then try to do another query, and you should see the exception.

@theacodes
Copy link
Contributor

This is just part of the nature of long connections. We should retry requests on connections errors. A similar issue was brought up in google-api-python-client.

@dhermes
Copy link
Contributor

dhermes commented May 20, 2016

@jonparrott Is this an issue that requests "fixes"? Or you're saying we should build a retry policy?

@theacodes
Copy link
Contributor

I actually don't know if urllib3/requests does that automatically.

@dhermes
Copy link
Contributor

dhermes commented May 23, 2016

@jennolsen84 @jonparrott I'm not sure what the "right" thing to do is for users. So far, we've taken the stance that service errors (e.g. a 500, 502, 503 from Google) are errors the user should be able to see. This one somewhat falls into that category: the connection timed out, all the time we waited for it was wasted. Maybe we could wrap it and provide a better exception? (We certainly need a retry story and this would fit into it, along with 429 errors, 404s from eventual consistency failures, etc.)

@tseaver WDYT?

@jennolsen84
Copy link
Author

@dhermes perhaps a autoreconnect flag would be helpful. So, if the user desires, it will automatically handle these cases. This will simplify the code a bit in many cases.

Another example to study is:
http://docs.sqlalchemy.org/en/rel_0_9/core/pooling.html?highlight=reconnect#dealing-with-disconnects

But, I have never really felt great about the way that SQL Alchemy handles it.

@dhermes
Copy link
Contributor

dhermes commented May 25, 2016

I'm still not sure what the right thing to do is here, since a broken pipe is a few layers below our library. @tseaver WDYT?

@tseaver
Copy link
Contributor

tseaver commented May 26, 2016

I think wrapping it in an exception defined in gcloud.exceptions would be sensible: that would make working it into our "retry" story easier.

@dhermes
Copy link
Contributor

dhermes commented May 27, 2016

The problem is "it". Wrapping "it" is ill-defined if people are bringing their own transport layer. In some sense we can catch something caused by the low-level http library but we really have no gaurantee what that layer will do. Also BrokenPipeError is in __builtins__ in Python 3.0, but not 2.0.

@jennolsen84
Copy link
Author

@dhermes @jonparrott

This SO answer goes over how retries are usually done with requests/urllib3, perhaps it helps:

http://stackoverflow.com/questions/21371809/cleanly-setting-max-retries-on-python-requests-get-or-post-method
http://www.coglib.com/~icordasc/blog/2014/12/retries-in-requests.html

I think it would be great to get some of this functionality built in to gcloud-python, and then easily be able to disable the retry mechanism if desired.

This will help users write less retry logic in their app, and can focus on just error handling when service is truly unavailable, etc.

@theacodes
Copy link
Contributor

This seems like more fuel to the fire of ditching httplib2.

On Sat, May 28, 2016, 11:25 PM jennolsen84 notifications@github.com wrote:

@dhermes https://github.com/dhermes @jonparrott
https://github.com/jonparrott

This SO answer goes over how retries are usually done with
requests/urllib3, perhaps it helps:

http://stackoverflow.com/questions/21371809/cleanly-setting-max-retries-on-python-requests-get-or-post-method
http://www.coglib.com/~icordasc/blog/2014/12/retries-in-requests.html

I think it would be great to get some of this functionality built in to
gcloud-python, and then easily be able to disable the retry mechanism if
desired.

This will help users write less retry logic in their app, and can focus on
just error handling when service is truly unavailable, etc.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1811 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAPUc9THk2oivmggz44VnmQ9tVKbpoSbks5qGTE-gaJpZM4IjFOb
.

@dhermes
Copy link
Contributor

dhermes commented Jun 7, 2016

Totally. We should have a pow-wow and get oauth2client.transport created.

@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@lukesneeringer
Copy link
Contributor

Hello,
One of the challenges of maintaining a large open source project is that sometimes, you can bite off more than you can chew. As the lead maintainer of google-cloud-python, I can definitely say that I have let the issues here pile up.

As part of trying to get things under control (as well as to empower us to provide better customer service in the future), I am declaring a "bankruptcy" of sorts on many of the old issues, especially those likely to have been addressed or made obsolete by more recent updates.

My goal is to close stale issues whose relevance or solution is no longer immediately evident, and which appear to be of lower importance. I believe in good faith that this is one of those issues, but I am scanning quickly and may occasionally be wrong. If this is an issue of high importance, please comment here and we will reconsider. If this is an issue whose solution is trivial, please consider providing a pull request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants