-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent Cloud Datastore Error: ServiceUnavailable: 503 #3128
Comments
Is there a chance that this is related: #2896 |
Also the |
Are you getting the error at 30 minute intervals like the author of #2896? |
Interesting that the second log set shows so much more frequent failures than the first one. |
@speedplane if this was asked already somewhere, sorry for the repeat question, but is this script using My thought is that either it's an upstream issue(in which case I feel like there would be more people chiming in) or there's something in your architecture that is unhappy. |
I saw that discussion. I don't use multiprocessing. I don't think there is threading, but just in case, I make sure all my clients are thread-local using the function below. Perhaps I should have mentioned, but this call is being made across projects. It is being made from an instance within a project named Also, I have a cloud support package, and I can try opening a case if that would help.
I can add my own retry logic and see if it helps, but given how often this is occurring, it would be preferable if the underlying issue could be addressed. I added a third screenshot of timestamps above. |
I'd recommend filling a support ticket. Feel free to cc me at
bookman@google.com or share your case #, so I can follow up on it and make
sure your issue gets resolved.
…On Fri, Mar 10, 2017, 8:56 AM Michael Sander ***@***.***> wrote:
I saw that discussion. I don't use multiprocessing. I don't think there is
threading, but just in case, I make sure all my clients are thread-local
using the function below.
Perhaps I should have mentioned, but this call is being made across
projects. It is being made from an instance within a project named ocr-api,
but is trying to grab data from a project named docketupdate.
Also, I have a cloud support package, and I can try opening a case if that
would help.
import threading
_datastore_client = threading.local()
def datastore_client(project = None):
'''Return a thread-unique datastore client for the given project.'''
# Our global datstore client cache.
global _datastore_client
if not getattr(_datastore_client, 'clients'):
_datastore_client.clients = {}
if project not in _datastore_client.clients:
# Create a new datastore client and save it in the cache.
_datastore_client.clients[project] = datastore.client.Client(project) \
if project else datastore.client.Client()
return _datastore_client.clients[project]
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3128 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABqI5wN-HzXktWwlMk9rS5Hqo-Uf74Cfks5rkYDVgaJpZM4MY4KO>
.
|
@speedplane the project information is helpful. I believe there was an issue about that a long time ago regarding that not being supported well? I can't seem to find it though. |
Done. See case ID 12283187. |
I stopped receiving these altogether on March 10 at around 3:30PM ET. I did not redeploy or change any settings in my app since then. Did something change on the backend? |
Not sure. Perhaps? |
@speedplane just double checking that you are still not seeing the errors. |
Closing, but feel free to reopen if the status changes. |
Can you please reopen? This is coming back with a vengeance. Same error, but slightly different codepath, see stacktrace below:
|
@speedplane I don't see anything on the status dashboard about an outage. It does seem strange that it's |
@speedplane Let me make sure I understand -- it was gone with the grpc 1.2 upgrade, and now it is back? But with a slightly different error? Is it still intermittent? |
@speedplane Seeing one Aside: @lukesneeringer doesn't GAPIC have a retry feature? |
It does. |
@dhermes @daspecster @lukesneeringer I am seeing this much more often than once a day, I see this roughly once a minute. I have a feeling that when I said the issue was gone, it wasn't, but only that I was not collecting the events properly. What other information would be helpful here? Not sure if this is helpful, but I am installing the SDK files from within my Dockerfile with the following command:
|
+1, I am also seeing this issue using the python datastore client. I went too far forward in my terminal to get the exact trace, but it is a 503 ServiceUnavailable thing. For me, it reliably happens when I am "starting up" the client after being inactive for a bit, and the second try always works. If it's the case that datastore is run on an apps instance, not knowing anything else I would guess it has to do with waking it up / initializing it for use. |
@speedplane @vsoch Is this still a problem? I noticed there has been a change in the datastore client and that retry is in the autogen client. |
Ah, I can't give additional feedback on this - we wound up switching to a different database. |
I still see datstore errors all the time, most recently 503 GOAWAY
messages. It occurs only about 0.01% of the time, but still far too much.
On Tue, Jan 30, 2018 at 2:15 PM Vanessa Sochat ***@***.***> wrote:
Ah, I can't give additional feedback on this - we wound up switching to a
different database.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3128 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAo9TIj_q3Ohq3HutkAR7wqTF6XhTOr4ks5tP2pUgaJpZM4MY4KO>
.
--
Best, -Michael
|
@vsoch Thank you for taking the time to respond. Perhaps in the future you can give our product a try again. I apologize for any inconvenience this may have caused. @speedplane From 4766 it seems like getting a 503 GOAWAY message from server side is a normal part of the process. I believe that we are having retries built into our clients which will hopefully eliminate those errors. I will keep track of our progress on that. Thank you for your response. |
The |
I'm going to go ahead and close this as the latest release include significant retry plumbing. If it's still an issue, feel free to comment and we'll re-open. |
Hi guys, I am using the google-cloud-datastore on AWS Lambda (serverless), and sometimes I am getting the error too :
I am using google-cloud-core 0.28.1 and google-cloud-datastore 1.6.0, |
@tgensol Hmm, your error is propagated from a |
Yeah, it's generally not safe for us to retry a commit operation blindly. If you are certain it's safe to retry, you can do so using our from google.api_core import exceptions
from google.api_core import retry
retry_commit = retry.Retry(
predicate=retry.if_exception_type(exceptions.ServiceUnavailable),
deadline=60)
...
retry(datastore_client.put)(task) |
I've dealt with this same problem. Using the retry handler directly can be difficult because the calls are often burried under middleware code. Instead, I've monkey hacked the default retry handler settings:
|
well, I certainly can't endorse that - but I'm glad it works for you. |
For what it's worth, I have an AppEngine Flask project, and I used to have As with above, the connection would eventually break and the client would get corrupted, so I've switched to instead instantiating a new client every time since I don't need to very often. I would've rather implemented a getter which checks if the client is working and reloads it if it isn't, but there doesn't seem to be a clean way of doing that? |
I am getting intermittent
ServiceUnavailable: 503
error messages when doing datastore operations with the google cloud client library. The stack trace is below. I am running Python 2.7 on Ubuntu, within a Docker container on a small Kubernetes cluster.For ease of reading, that last line is expanded:
I am running this within a Docker container orchestrated by Kubernetes. My Dockerfile inherits from ubuntu:trusty. Below is version information:
The text was updated successfully, but these errors were encountered: