- Author(s): Roberto J. Rojas
- State: Draft
- Updated: 5/11/2023
Across Dapr errors are surfaced for different conditions, without consistent messages, details of the error, standard formats, and no clear indication of what/where the error initiated.
This makes troubleshooting and debugging quite difficult and requires a deep understanding of the parts of Dapr and how those parts interact with each other.
To help with the issues raised above, it would be ideal if a solution could provide:
- Greater details of errors that occured.
- Error details in a structured format.
- Consistency in the error details.
- An indication where within the Dapr execution (Init, Runtime, Components, SDKs, etc...) the error occurred.
Utilize and follow the gRPC Richer Error Model and Google API Errors Model in the Design Guide
The Google API Error Model has the following Protobuf format:
package google.rpc;
// The `Status` type defines a logical error model that is suitable for
// different programming environments, including REST APIs and RPC APIs.
message Status {
// A simple error code that can be easily handled by the client. The
// actual error code is defined by `google.rpc.Code`.
int32 code = 1;
// A developer-facing human-readable error message in English. It should
// both explain the error and offer an actionable resolution to it.
string message = 2;
// Additional error information that the client code can use to handle
// the error, such as retry info or a help link.
repeated google.protobuf.Any details = 3;
}
Here is one of the possible details that can be added to the above error structure. This is defined in the error_details.proto Protobuf
message ErrorInfo {
// The reason of the error. This is a constant value that identifies the
// proximate cause of the error. Error reasons are unique within a particular
// domain of errors. This should be at most 63 characters and match a
// regular expression of `[A-Z][A-Z0-9_]+[A-Z0-9]`, which represents
// UPPER_SNAKE_CASE.
string reason = 1;
// The logical grouping to which the "reason" belongs. The error domain
// is typically the registered service name of the tool or product that
// generates the error. Example: "pubsub.googleapis.com". If the error is
// generated by some common infrastructure, the error domain must be a
// globally unique value that identifies the infrastructure. For Google API
// infrastructure, the error domain is "googleapis.com".
string domain = 2;
// Additional structured details about this error.
//
// Keys should match /[a-zA-Z0-9-_]/ and be limited to 64 characters in
// length. When identifying the current value of an exceeded limit, the units
// should be contained in the key, not the value. For example, rather than
// {"instanceLimit": "100/request"}, should be returned as,
// {"instanceLimitPerRequest": "100"}, if the client exceeds the number of
// instances that can be created in a single (batch) request.
map<string, string> metadata = 3;
}
The properties of the google.rpc.Status will be populated as following:
-
Code - Protocol level error code. These could be either gRPC or HTTP error codes. See (gRPC Codes ProtoBuf)[https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto]
Example: "InvalidArgument Code = 3", "Internal Code = 13"
-
Message - Error message.
-
Details - A set of standard error payloads for error details. These list can be found in Error Details
Example: "ErrorInfo", "ResourceInfo"
Below is partial table of the Standard Error code provided by gRPC and how they map to HTTP error codes. The entire list can found in the following links:
- [Google API Error Handling]https://cloud.google.com/apis/design/errors#handling_errors
- (gRPC Codes ProtoBuf)[https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto]
HTTP | gRPC | Description |
---|---|---|
200 | OK | No error. |
400 | INVALID_ARGUMENT | Client specified an invalid argument. Check error message and error details for more information. |
400 | FAILED_PRECONDITION | Request can not be executed in the current system state, such as deleting a non-empty directory. |
400 | OUT_OF_RANGE | Client specified an invalid range. |
401 | UNAUTHENTICATED | Request not authenticated due to missing, invalid, or expired authorization credentials. |
403 | PERMISSION_DENIED | Client does not have sufficient permission. |
404 | NOT_FOUND | A specified resource is not found. |
409 | ABORTED | Concurrency conflict, such as read-modify-write conflict. |
The properties of the type.googleapis.com/google.rpc.ErrorInfo will be populated as following:
-
Reason - A combination of prefix from prefix of the table below plus the error condition code.
Example: "DAPR_STATE_" + "ETAG_MISMATCH"
-
Domain - With the value
dapr.io
. -
Metadata - A key/value map/dictionary data relevant to the error condition.
**Note:**
The metadata property retriable with a truthable value("true", "false", "True", "False", "TRUE", "FALSE", "1", "0") is required.
The properties of the type.googleapis.com/google.rpc.ResourceInfo will be populated as following:
-
ResourceType - The building block type with version.
Example: "state.redis/v1"
-
ResourceName - The component name.
Example: "my-component-name"
-
Owner - The owner of the component.
-
Description - Resource descrpition or error details.
The following tables shows the propsosed error codes prefixes used in the reason for the google.rpc.ErrorInfo for various Dapr building blocks:
INIT
Dapr Module | Prefix |
---|---|
CLI | DAPR_CLI_INIT_* |
Self-hosted | DAPR_SELF_HOSTED_INIT_* |
K8S | DAPR_K8S_INIT_* |
Invoke | DAPR_INVOKE_INIT_* |
RUNTIME
Dapr Module | Prefix |
---|---|
CLI | DAPR_RUNTIME_CLI_* |
Self-hosted | DAPR_SELF_HOSTED_* |
dapr-2-dapr(gRPC) | DAPR_RUNTIME_GRPC_* |
COMPONENTS
Dapr Module | Prefix |
---|---|
PubSub | DAPR_PUBSUB_* |
StateStore | DAPR_STATE_* |
Bindings | DAPR_BINDING_* |
SecretStore | DAPR_SECRET_* |
ConfigurationStore | DAPR_CONFIGURATION_* |
Lock | DAPR_LOCK_* |
NameResolution | DAPR_NAME_RESOLUTION_* |
Middleware | DAPR_MIDDLEWARE_* |
The following snippet shows an error status returned due to a ETAG_MISMATCH
error condition. The reason is populated with PREFIX+ERROR_CONDITION
:
{
"code": 3,
"message": "possible etag mismatch. error from state store",
"details": [
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"reason": "DAPR_STATE_ETAG_MISMATCH",
"domain": "dapr.io",
"metadata": {
"key": "myapp||name"
}
},
{
"@type": "type.googleapis.com/google.rpc.ResourceInfo",
"resource_type": "state.redis/v1",
"resource_name": "my-component",
"owner": "",
"description": "possible etag mismatch. error from state store"
}
]
}
import (
...
"google.golang.org/genproto/googleapis/rpc/errdetails"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
...
)
...
if req.ETag != nil {
...
ste := status.Newf(codes.InvalidArgument, messages.ErrStateGet, in.Key, in.StoreName, err.Error())
ei := errdetails.ErrorInfo{
Domain: "dapr.io",
Reason: "DAPR_STATE_ETAG_MISMATCH",
Metadata: map[string]string{
"storeName": in.StoreName,
},
}
ri := errdetails.ResourceInfo{
ResourceType: "state.redis/v1",
ResourceName: "my-redis-component",
Owner: "user",
Description: "possible etag mismatch. error from state store",
}
ste, err2 := ste.WithDetails(&ei, &ri)
...
return ste.Err()
}
- Since the Dapr Runtime is using protocol buffers as the data format, support for the richer error model is already included in most of the gRPC implementations.
- This would help minimize the changes with the Dapr ecosystem.
- This solution could be used to programmatically react to errors as it provides a standard structure for the errors with details.
- Dependencies on gPRC richer error model.
- Need to test gRPC implementations support for all Dapr SDKs.
For the POC I've made changes to some parts of the Dapr modules (). The POC code can be found in my GH Repo under the branch error-codes-poc
These are the gRPC imports used:
import (
...
"google.golang.org/genproto/googleapis/rpc/errdetails"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
...
)
The files changed for this POC:
https://github.com/robertojrojas/components-contrib/tree/error-codes-poc
- state/redis/redis.go
- state/store.go
https://github.com/robertojrojas/dapr-kit/tree/error-codes-poc
- pkg/proto/customerrors/v1/customerrors.pb.go
- proto/customerrors/v1/customerrors.proto
- status/customerrorcodes.go
- status/status.go
https://github.com/robertojrojas/dapr/tree/error-codes-poc
- pkg/diagnostics/grpc_tracing.go
- pkg/grpc/api.go
- pkg/http/api.go
- pkg/http/responses.go
https://github.com/robertojrojas/dapr-go-sdk/tree/error-codes-poc
- client/state.go
https://github.com/robertojrojas/dapr-cli/tree/error-codes-poc
- pkg/standalone/invoke.go
https://github.com/robertojrojas/dapr-dotnet-sdk/tree/error-codes-poc
- src/Dapr.Client/DaprClientGrpc.cs