Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create exceptions (and bases) for RPC-only errors in core #2936

Closed
wants to merge 3 commits into from

Conversation

bendemaree
Copy link

Several RPC exceptions that can be raised from a protobuf response map to the same HTTP status code, which creates some semantic overlap, as the current strategy is to always map to an HTTP status code. An example of this is DATA_LOSS and INTERNAL, which are two different conditions as reported by the backend but are mapped to HTTP 500 (as per the comments in the proto definition), but may have different operational semantics.

My intent in doing this is to address the issue in #2583 which is thus far unresolved and seems to be dead in the water. After establishing these new exception classes, it's a simple matter to raise them when relevant rather than mashing multiple backend error codes onto a single exception as is currently implemented. As noted in #2583 this prevents the library user from appropriately handling some error conditions with retry logic.

Pending this, I'll follow up with a simple PR that raises the RPC-specific exceptions.

Note: I hemmed and hawed about re-using the code class attribute on these exceptions, because it is a reach IMO to extend the semantics of that from "HTTP status code" to "some protocol that has numeric status codes" but I decided it's alright in this case primarily because it keeps it simple and there's little danger of an additional 84 proto error codes being gobbled up (at which point you'd lose the invariant that status code < 100 => RPC status code, not HTTP).

Ben Demaree added 2 commits January 16, 2017 19:46
Several RPC exceptions that can be raised from a protobuf response map
to the same HTTP status code, which creates some semantic overlap, as
the current strategy is to always map to an HTTP status code. An example
of this is DATA_LOSS and INTERNAL, which are two different conditions as
reported by the backend but are mapped to HTTP 500 (as per the comments
in the proto definition), but may have different operational semantics.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 17, 2017
def test_rpc_only_errors_not_exported(self):
from google.cloud.exceptions import GoogleCloudRPCError
from google.cloud.exceptions import _HTTP_CODE_TO_EXCEPTION
http_exceptions = _HTTP_CODE_TO_EXCEPTION.values()

This comment was marked as spam.

This comment was marked as spam.

@daspecster
Copy link
Contributor

Thanks @bendemaree!
This looks pretty good to me besides my one comment.

WDYT @dhermes & @tseaver?

@dhermes
Copy link
Contributor

dhermes commented Jan 17, 2017

I am 👎 to this approach. For now, we support both JSON-over-HTTP (or protobuf-over-HTTP in the case of datastore) and protobuf-over-gRPC APIs. As a result, the same calls, e.g. Client.put() can result in very different (backend) exceptions. Our goal is to catch/wrap and raise the same exception for either backend.

So with that goal in mind, I imagined a fix where we just allow more optional metadata to be passed during the construction of our current exceptions. For example UNKNOWN, INTERNAL and DATA_LOSS all map to 500 (InternalServerError), which loses information. But if we code get more info, the data could be preserved, e.g.

>>> try:
...     do_something()
... except InternalServerError as exc:
...     print(exc.grpc_code)
...
<StatusCode.UNAVAILABLE: (14, 'unavailable')>

@bendemaree
Copy link
Author

Our goal is to catch/wrap and raise the same exception for either backend.

This goal seems very odd to me. You have two different transports, not two different backends. Is it really true that the same operation causes different errors to be raised, assuming the error is bubbled up completely? One transport's status code semantics (HTTP) constrain you in a way that causes ambiguity, which I understand, but as the client library, you are on the hook to disambiguate that in a way that is idiomatic in the language at hand. An exception that has different meaning based on its state post-instantiation is not Pythonic, no?

@bendemaree
Copy link
Author

bendemaree commented Jan 17, 2017

Put another way: using a Python client for a service should not make me, a library user, be concerned with whether I'm using HTTP or gRPC to talk to the backend, if the two protocols are speaking to the same backend...and I understand them to be the same server on the other side, despite differing serving stacks.

@dhermes
Copy link
Contributor

dhermes commented Jan 17, 2017

a library user, be concerned with whether I'm using HTTP or gRPC to talk to the backend

You're totally right.

The immediate question that comes to my mind is: how much of the gRPC type information can we get from the equivalent HTTP errors (either as headers or from the payload)? Do you have an example failure in mind? (If we stick to datastore, then the response payload on error is literally a protobuf, so it's not really apples-apples.)

@bendemaree
Copy link
Author

bendemaree commented Jan 17, 2017

I thought even the HTTP responses, despite having an HTTP status code, had the RPC-like exception available. In these docs under "Here's an example of an error response for a JSON request" it shows "status": "INVALID_ARGUMENT" which should correspond to one of the error constants from the protobuf linked earlier...unless I'm misunderstanding your question. 😄

@bendemaree
Copy link
Author

@dhermes just following up on this. To answer your question more directly, the docs are actually clearer than the snippet I grabbed:

The response object contains a single field error whose value contains the following elements:

Element Description
code An HTTP status code that generically identifies the request failure.
message Specific information about the request failure.
status The canonical error code (google.rpc.Code) for Google APIs. Codes that may be returned by the Cloud Datastore API are listed in Error Codes.

Essentially, the current implementation only maps exceptions from code where it ought to use status. This format looks like it carries over to other Cloud* APIs as well.

@lukesneeringer
Copy link
Contributor

Just adding a general comment on this:

Put another way: using a Python client for a service should not make me, a library user, be concerned with whether I'm using HTTP or gRPC to talk to the backend, if the two protocols are speaking to the same backend...and I understand them to be the same server on the other side, despite differing serving stacks.

I think this is the major thrust of @dhermes' point, but I think there are two ways to resolve it:

Here is a thought experiment: Say I wrote some Python code to use datastore, and am doing so in a hosting environment where (for whatever reason) GRPC is unavailable. So, I am using the HTTP backend. I get an exception for DATA_LOSS, and I write code in my application to catch the exception and take some appropriate action.

Now, say I migrate that code to a new hosting environment where GRPC is available. Installing the same packages successfully installs GRPC. We want that code to continue to work, as-is. This is why it matters that the exceptions we raise be the same between transports, precisely so the application developer does not need to be concerned with it and the code is portable.

That said, it does look like the HTTP errors provide sufficient information to disambiguate, so it is possible to do what you are suggesting with both transports.

@bendemaree
Copy link
Author

@lukesneeringer Yes, I fully understand; proper exceptions keep things portable and abstract over backend, transport, environment, etc. The problem is that the exceptions are ambiguous, not that they're environment-specific. My frustration with discussing this issue is that it has nothing to do with whether we're using the APIs over HTTP or gRPC, both because "it shouldn't" (that is, it is proper for it not to matter) and because the RPC status codes are present in both responses...the transport is a non-issue IMO.

What is an issue is that UNKNOWN, INTERNAL, and DATA_LOSS still are all mashed into InternalServerError, and ALREADY_EXISTS and ABORTED are still mashed into Conflict, an issue we first raised in mid-October. We are forced to remain on gcloud>=0.18,<0.19 where we can properly handle the different error cases. If you are wondering why it's so important (why we don't just retry things that aren't supposed to be retried, say), try opening a ticket regarding an elevated Cloud Datastore error rate; the first thing we are asked has been to verify our retry behavior. If it's incorrect in any way, we have no case.

We are on the verge of having to maintain a fork that disambiguates the error cases due to dead-ends and non-responsiveness on the multiple issues that are open. Is there any path forward with this code, or should we abandon hope and consider this closed?

@dhermes
Copy link
Contributor

dhermes commented Feb 16, 2017

@bendemaree You converted me to "Team Ben" many weeks ago, just haven't had the cycles to get these changes in. Sorry for the frustration.

@daspecster
Copy link
Contributor

@dhermes what all is left for do for this?

@dhermes
Copy link
Contributor

dhermes commented Feb 16, 2017

@daspecster Potentially a lot. The exceptions defined in this PR are a teensy bit too specific to the transport. What we really need is a ground-up rethinking of the exceptions that can be transport agnostic, but won't be too novel that it confuses people used to HTTP status/error codes.

@bendemaree
Copy link
Author

@dhermes That's alright, thanks for circling back. 😄 I am more than happy to help wherever I can be useful; just say the word.

@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@lukesneeringer
Copy link
Contributor

This is superceded by #3738. Closing for now, but reopen if there is anything not covered there.

vchudnov-g pushed a commit that referenced this pull request Sep 20, 2023
…m/python-docs-samples#2936)

* dialogflow: make flaky test more generic

* use uuid instead of datetime to avoid conflicts when parallel tests are run

Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: core cla: yes This human has signed the Contributor License Agreement. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants