Skip to content

Commit

Permalink
Metric Error Codes Recording + Error Code Prefixing (dapr#8256)
Browse files Browse the repository at this point in the history
* further fine tuning

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* try fixing borked e2e

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* try fixing standalone validation

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* install numpy too

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* update pip & install pkgs globally instead of venv

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* install requests

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* // -> # for commented line

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* match only installed version of powershell

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>

* wip: errorcode const consolidation

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* wip: error code monitoring/metric recording

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* style: error_code -> errorcode

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix: error changes from refactor

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* remove 2 unused error codes

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* Revert "wip: error code monitoring/metric recording"

This reverts commit 53a02b5.

Done so to implement in a separate PR.

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* Revert "Revert "wip: error code monitoring/metric recording""

This reverts commit 44c31be.

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* use RecordAndGet() to record error code metric only whenever ApiError var will be logged

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* undo invalid error code recording

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* apply cohesive prefixes to error codes

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* correctly apply metric recording to inline string error definitions, fix unit tests

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* add explicit "type" field for error code metrics

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* linting improvements

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* errorcode recording unit test, fix: register type field on init

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* add category to error codes for O(1) operation on recording

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* use error code vars in tests

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fumpt

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* convert last test strings, fix some integration tests

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix errorcodes compile error

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* golangci-lint fixes

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* lint fix

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* remove debug binary

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* Revert "lint fix"

This reverts commit 392080a.

Revert "golangci-lint fixes"

This reverts commit d6c6066.

Revert "fix errorcodes compile error"

This reverts commit 9739534.

Revert "convert last test strings, fix some integration tests"

This reverts commit ae39228.

Revert "fumpt"

This reverts commit 7f04bb3.

Revert "use error code vars in tests"

This reverts commit c6ae704.

remove addtl err code var in test

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* revise "key" field in metric to "category"

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* restore original error codes + bug fixes

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix configuration test

Signed-off-by: Jake Engelberg <jake@diagrid.io>

fix copyright headers

Signed-off-by: Jake Engelberg <jake@diagrid.io>

utilize todo context w/o struct member

Signed-off-by: Jake Engelberg <jake@diagrid.io>

nelson style improvements

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix: remove doubled calls of RecordAndGet()

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* reduce repitition in pubsub.go by recording at parent build()

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* add specific recorder logic for Jobs API composite err codes

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* refactor metric RecordX() funcs to reduce repetition

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* lint fix

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* metric naming: count -> total

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix: style/naming

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix: correct crypto metric recording w/ wrapper funcs

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix: remove extra metric recordings

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* integtest: error code metrics

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* fix duplicated/missing http recordings

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* clarify/further consolidate recording funcs

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* further reduction of recording, requires dapr/kit PR

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* Add metric spec plural copy, remove some comments

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* nit improvements

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* provide newest dapr/kit to properly record error code

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* remove dapr/kit replace, add log for malformed error code

Signed-off-by: Jake Engelberg <jake@diagrid.io>

* log: error -> warn

Signed-off-by: Jake Engelberg <jake@diagrid.io>

---------

Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Signed-off-by: Jake Engelberg <jake@diagrid.io>
Co-authored-by: Cassandra Coyle <cassie@diagrid.io>
Co-authored-by: Elena Kolevska <elena-kolevska@users.noreply.github.com>
Co-authored-by: Yaron Schneider <schneider.yaron@live.com>
Co-authored-by: Artur Souza <asouza.pro@gmail.com>
  • Loading branch information
5 people authored Nov 27, 2024
1 parent 19fc533 commit fdd642e
Show file tree
Hide file tree
Showing 37 changed files with 866 additions and 248 deletions.
4 changes: 4 additions & 0 deletions charts/dapr/crds/configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,8 @@ spec:
description: If true (default is false) HTTP verbs (e.g., GET, POST) are excluded from the metrics.
type: boolean
type: object
recordErrorCodes:
type: boolean
rules:
items:
description: MetricsRule defines configuration options for a
Expand Down Expand Up @@ -329,6 +331,8 @@ spec:
items:
type: integer
type: array
recordErrorCodes:
type: boolean
rules:
items:
description: MetricsRule defines configuration options for a
Expand Down
7 changes: 7 additions & 0 deletions docs/development/dapr-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,13 @@ Dapr uses prometheus process and go collectors by default.

## Dapr Runtime metrics

### Error code metrics

[errorcode metrics](../../pkg/diagnostics/errorcode_monitoring.go)

* error_code_count: Number of times an error with a specific error code occurred.


### Service related metrics

[service metrics](../../pkg/diagnostics/service_monitoring.go)
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ require (
github.com/cenkalti/backoff/v4 v4.3.0
github.com/cloudevents/sdk-go/v2 v2.15.2
github.com/dapr/components-contrib v1.14.1-0.20241104052509-b969bbfe8867
github.com/dapr/kit v0.13.1-0.20241015130326-866002abe68a
github.com/dapr/kit v0.13.1-0.20241127165251-30e2c24840b4
github.com/diagridio/go-etcd-cron v0.3.1-0.20241113192108-260d6b1861d3
github.com/evanphx/json-patch/v5 v5.9.0
github.com/go-chi/chi/v5 v5.0.11
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -464,8 +464,8 @@ github.com/danieljoos/wincred v1.1.2 h1:QLdCxFs1/Yl4zduvBdcHB8goaYk9RARS2SgLLRuA
github.com/danieljoos/wincred v1.1.2/go.mod h1:GijpziifJoIBfYh+S7BbkdUTU4LfM+QnGqR5Vl2tAx0=
github.com/dapr/components-contrib v1.14.1-0.20241104052509-b969bbfe8867 h1:E7SK8yjXdIZfuAVbyliCfSJ5vLPZlmUDhO/za3sQOeE=
github.com/dapr/components-contrib v1.14.1-0.20241104052509-b969bbfe8867/go.mod h1:egSwClldk7DPHSVMavg5QBfwhdTP2c55h4G99SE3VJQ=
github.com/dapr/kit v0.13.1-0.20241015130326-866002abe68a h1:3+EDN84/gtelE14Ka3TW8pP52vC8QdSHKeNIe3JpsYU=
github.com/dapr/kit v0.13.1-0.20241015130326-866002abe68a/go.mod h1:Hz1W2LmWfA4UX/12MdA+brsf+np6f/1dJt6C6F63cjI=
github.com/dapr/kit v0.13.1-0.20241127165251-30e2c24840b4 h1:8/ShPl4+AVF70mWcRWEAF/Hz6JDB0PEh6z3X0rJAyps=
github.com/dapr/kit v0.13.1-0.20241127165251-30e2c24840b4/go.mod h1:HwFsBKEbcyLanWlDZE7u/jnaDCD/tU+n3pkFNUctQNw=
github.com/dave/jennifer v1.4.0/go.mod h1:fIb+770HOpJ2fmN9EPPKOqm1vMGhB+TwXKMZhrIygKg=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
Expand Down
13 changes: 9 additions & 4 deletions pkg/api/errors/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (

"google.golang.org/grpc/codes"

"github.com/dapr/dapr/pkg/messages/errorcodes"
kiterrors "github.com/dapr/kit/errors"
)

Expand All @@ -29,51 +30,55 @@ const (
PostFixEmpty ReasonSegment = "EMPTY"
)

func NotFound(name string, componentType string, metadata map[string]string, grpcCode codes.Code, httpCode int, legacyTag string, reason string) error {
func NotFound(name string, componentType string, metadata map[string]string, grpcCode codes.Code, httpCode int, legacyTag string, reason string, category errorcodes.Category) error {
message := fmt.Sprintf("%s %s is not found", componentType, name)

return kiterrors.NewBuilder(
grpcCode,
httpCode,
message,
legacyTag,
string(category),
).
WithErrorInfo(reason, metadata).
Build()
}

func NotConfigured(name string, componentType string, metadata map[string]string, grpcCode codes.Code, httpCode int, legacyTag string, reason string) error {
func NotConfigured(name string, componentType string, metadata map[string]string, grpcCode codes.Code, httpCode int, legacyTag string, reason string, category errorcodes.Category) error {
message := componentType + " " + name + " is not configured"

return kiterrors.NewBuilder(
grpcCode,
httpCode,
message,
legacyTag,
string(category),
).
WithErrorInfo(reason, metadata).
Build()
}

func Empty(name string, metadata map[string]string, reason string) error {
func Empty(name string, metadata map[string]string, reason string, category errorcodes.Category) error {
message := name + " is empty"
return kiterrors.NewBuilder(
codes.InvalidArgument,
http.StatusBadRequest,
message,
"",
string(category),
).
WithErrorInfo(reason, metadata).
Build()
}

func IncorrectNegative(name string, metadata map[string]string, reason string) error {
func IncorrectNegative(name string, metadata map[string]string, reason string, category errorcodes.Category) error {
message := name + " cannot be negative"
return kiterrors.NewBuilder(
codes.InvalidArgument,
http.StatusBadRequest,
message,
"",
string(category),
).
WithErrorInfo(reason, metadata).
Build()
Expand Down
32 changes: 17 additions & 15 deletions pkg/api/errors/pubsub.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"google.golang.org/grpc/codes"

"github.com/dapr/components-contrib/metadata"
"github.com/dapr/dapr/pkg/messages/errorcodes"
"github.com/dapr/kit/errors"
)

Expand Down Expand Up @@ -79,7 +80,7 @@ func (p PubSubError) PublishMessage(topic string, err error) error {
codes.Internal,
http.StatusInternalServerError,
fmt.Sprintf("error when publishing to topic %s in pubsub %s: %s", topic, p.name, err),
"ERR_PUBSUB_PUBLISH_MESSAGE",
errorcodes.PubsubPublishMessage,
"PUBLISH_MESSAGE",
)
}
Expand All @@ -89,7 +90,7 @@ func (p *PubSubError) PublishForbidden(topic, appID string, err error) error {
codes.PermissionDenied,
http.StatusForbidden,
fmt.Sprintf("topic %s is not allowed for app id %s", topic, appID),
"ERR_PUBSUB_FORBIDDEN",
errorcodes.PubsubForbidden,
"FORBIDDEN",
)
}
Expand All @@ -102,7 +103,7 @@ func (p PubSubError) TestNotFound(topic string, err error) error {
codes.NotFound,
http.StatusBadRequest,
fmt.Sprintf("pubsub '%s' not found", p.name),
"ERR_PUBSUB_NOT_FOUND",
errorcodes.PubsubNotFound,
"TEST_NOT_FOUND",
)
}
Expand All @@ -113,7 +114,7 @@ func (p *PubSubMetadataError) NotFound() error {
codes.InvalidArgument,
http.StatusNotFound,
fmt.Sprintf("%s %s is not found", metadata.PubSubType, p.p.name),
"ERR_PUBSUB_NOT_FOUND",
errorcodes.PubsubNotFound,
errors.CodeNotFound,
)
}
Expand All @@ -124,7 +125,7 @@ func (p *PubSubMetadataError) NotConfigured() error {
codes.FailedPrecondition,
http.StatusBadRequest,
fmt.Sprintf("%s %s is not configured", metadata.PubSubType, p.p.name),
"ERR_PUBSUB_NOT_CONFIGURED",
errorcodes.PubsubNotConfigured,
errors.CodeNotConfigured,
)
}
Expand All @@ -141,7 +142,7 @@ func (p *PubSubMetadataError) NameEmpty() error {
codes.InvalidArgument,
http.StatusNotFound,
"pubsub name is empty",
"ERR_PUBSUB_EMPTY",
errorcodes.PubsubEmpty,
"NAME_EMPTY",
)
}
Expand All @@ -151,7 +152,7 @@ func (p *PubSubMetadataError) TopicEmpty() error {
codes.InvalidArgument,
http.StatusNotFound,
"topic is empty in pubsub "+p.p.name,
"ERR_TOPIC_NAME_EMPTY",
errorcodes.PubsubTopicNameEmpty,
"TOPIC_NAME_EMPTY",
)
}
Expand All @@ -161,7 +162,7 @@ func (p *PubSubMetadataError) DeserializeError(err error) error {
codes.InvalidArgument,
http.StatusBadRequest,
fmt.Sprintf("failed deserializing metadata. Error: %s", err),
"ERR_PUBSUB_REQUEST_METADATA",
errorcodes.PubsubRequestMetadata,
"METADATA_DESERIALIZATION",
)
}
Expand All @@ -171,7 +172,7 @@ func (p *PubSubMetadataError) CloudEventCreation() error {
codes.InvalidArgument,
http.StatusInternalServerError,
"cannot create cloudevent",
"ERR_PUBSUB_CLOUD_EVENTS_SER",
errorcodes.PubsubCloudEventsSer,
"CLOUD_EVENT_CREATION",
)
}
Expand All @@ -185,7 +186,7 @@ func (p *PubSubTopicError) MarshalEnvelope() error {
codes.InvalidArgument,
http.StatusBadRequest,
msg,
"ERR_PUBSUB_EVENTS_SER",
errorcodes.PubsubEventsSer,
"MARSHAL_ENVELOPE",
)
}
Expand All @@ -200,7 +201,7 @@ func (p *PubSubTopicError) MarshalEvents() error {
codes.InvalidArgument,
http.StatusBadRequest,
message,
"ERR_PUBSUB_EVENTS_SER",
errorcodes.PubsubEventsSer,
"MARSHAL_EVENTS",
)
}
Expand All @@ -216,13 +217,13 @@ func (p *PubSubTopicError) UnmarshalEvents(err error) error {
codes.InvalidArgument,
http.StatusBadRequest,
message,
"ERR_PUBSUB_EVENTS_SER",
errorcodes.PubsubEventsSer,
"UNMARSHAL_EVENTS",
)
}

func (p *PubSubMetadataError) build(grpcCode codes.Code, httpCode int, msg, tag, errCode string) error {
err := errors.NewBuilder(grpcCode, httpCode, msg, tag)
func (p *PubSubMetadataError) build(grpcCode codes.Code, httpCode int, msg string, tag errorcodes.ErrorCode, errCode string) error {
err := errors.NewBuilder(grpcCode, httpCode, msg, tag.Code, string(tag.Category))
if !p.skipResourceInfo {
err = err.WithResourceInfo(string(metadata.PubSubType), p.p.name, "", msg)
}
Expand All @@ -238,7 +239,8 @@ func PubSubOutbox(appID string, err error) error {
codes.Internal,
http.StatusInternalServerError,
message,
"ERR_PUBLISH_OUTBOX",
errorcodes.PubsubPublishOutbox.Code,
string(errorcodes.CategoryPubsub),
).WithErrorInfo(errors.CodePrefixPubSub+"OUTBOX", map[string]string{
"appID": appID, "error": err.Error(),
}).Build()
Expand Down
21 changes: 16 additions & 5 deletions pkg/api/errors/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (

"google.golang.org/grpc/codes"

"github.com/dapr/dapr/pkg/messages/errorcodes"
kiterrors "github.com/dapr/kit/errors"
)

Expand All @@ -40,57 +41,67 @@ const (
)

func SchedulerURLName(metadata map[string]string) error {
compErrCode := string(CodePrefixScheduler + InFixJob + PostFixName)
message := "Set the job name in the url only"
return kiterrors.NewBuilder(
codes.InvalidArgument,
http.StatusBadRequest,
message,
"",
string(errorcodes.CategoryJob),
).
WithErrorInfo(string(CodePrefixScheduler+InFixJob+PostFixName), metadata).
WithErrorInfo(compErrCode, metadata).
Build()
}

func SchedulerScheduleJob(metadata map[string]string, err error) error {
compErrCode := string(CodePrefixScheduler + InFixSchedule + PostFixJob)
return kiterrors.NewBuilder(
codes.Internal,
http.StatusInternalServerError,
MsgScheduleJob+" due to: "+err.Error(),
"",
string(errorcodes.CategoryJob),
).
WithErrorInfo(string(CodePrefixScheduler+InFixSchedule+PostFixJob), metadata).
WithErrorInfo(compErrCode, metadata).
Build()
}

func SchedulerGetJob(metadata map[string]string, err error) error {
compErrCode := string(CodePrefixScheduler + InFixGet + PostFixJob)
return kiterrors.NewBuilder(
codes.Internal,
http.StatusInternalServerError,
MsgGetJob+" due to: "+err.Error(),
"",
string(errorcodes.CategoryJob),
).
WithErrorInfo(string(CodePrefixScheduler+InFixGet+PostFixJob), metadata).
WithErrorInfo(compErrCode, metadata).
Build()
}

func SchedulerListJobs(metadata map[string]string, err error) error {
compErrCode := string(CodePrefixScheduler + InFixList + PostFixJobs)
return kiterrors.NewBuilder(
codes.Internal,
http.StatusInternalServerError,
MsgListJobs+" due to: "+err.Error(),
"",
string(errorcodes.CategoryJob),
).
WithErrorInfo(string(CodePrefixScheduler+InFixList+PostFixJobs), metadata).
WithErrorInfo(compErrCode, metadata).
Build()
}

func SchedulerDeleteJob(metadata map[string]string, err error) error {
compErrCode := string(CodePrefixScheduler + InFixDelete + PostFixJob)
return kiterrors.NewBuilder(
codes.Internal,
http.StatusInternalServerError,
MsgDeleteJob+" due to: "+err.Error(),
"",
string(errorcodes.CategoryJob),
).
WithErrorInfo(string(CodePrefixScheduler+InFixDelete+PostFixJob), metadata).
WithErrorInfo(compErrCode, metadata).
Build()
}
Loading

0 comments on commit fdd642e

Please sign in to comment.