-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log error fields as tags #4801
Log error fields as tags #4801
Conversation
5bfed31
to
4d43747
Compare
Pull Request Test Coverage Report for Build a7eb836d-4cca-43a2-b4da-46cbf9e55243
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without an accompanying "add error type info when logging" + "add error type when serializing for rpc response", kinda seems like this risks making our errors a far bit less useful, particularly for RPC purposes. e.g. we don't currently have responses like BadRequestError{ BadRequestError{Message:...} }
, so wouldn't this lead to type info being lost in the string, which is used quite a few places as the only visible value? particularly when it isn't (or isn't correctly) converted to a custom RPC error type.
in principle I like the log-tag idea, but this seems kinda questionable at the moment.
and the next phase of this would probably be to add a "tag this error with" error-wrapper-type... but that can come later after other 1.13-error changes and it'll be easy to convert, no need to delay current improvements.
separately, if we're interested in moving to something like
{
"error": "stuff",
"error-details": {
"key": "value",
"...": ...
}
}
instead of this PR's
{
"error": "stuff",
"error-key": value,
"error-...": ...,
}
there are ways to get zap to build ad-hoc nesting. I can pull in some code I've written before to do exactly that.
At the RPC boundary we map these types to either Thrift or Proto types. And that is done based on types themselves (not the error message string). See here and here. So clients using IDL types, will not see this change anyway. One place where it is would be visible is our CLI, as it also uses these types. So error messages printed via CLI would drop the type info. Not sure if that type is super useful for the user TBH, the message itself usually the interesting bit.
Nested approach might also work. Do you whether it is possible for messages to be grouped by a subfield of the nested field? I mainly went this direction to be sure that filter/group on these fields, but if it is possible to do with nested structures, they could work as well. |
Co-authored-by: Steven L <stevenl@uber.com>
Sorry about the delay :|
Not entirely sure what you mean, but the thing I've built before lets you write log.Info("stuff",
nested("x",
zap.String("yay", "strings"),
zap.Int("and", 1138))) and get {
"msg":"stuff",
"x":{
"yay":"strings",
"and":1138
}
} and it works for building structures of arbitrary depth. It just reports itself as a zap-object-marshaler with whatever you pass as sub-fields. tbh it's probably something zap should provide, the "namespace" thing is strange and doesn't do what you'd expect, and there's kinda no way to build structure.... in a structured logging library. I may make a PR for them at some point. |
Ok, I have tested it and it looks like it would work just fine. Nested structures are flattened in our monitoring tools like so: "error-details": {
"key": "value"
} appears as:
You can filter on it, as well as group by it. So it is all good. Yay! Update the code. This approach is also look cleaner. However we need to expose |
Yeah, tbh I think we should consider just getting rid of it. Zap's core is very hook-able and efficient, it's not difficult to convert it to [pretty much literally anything]. Though for this PR's purposes: a custom logger impl doesn't need to respect these error fields. And the zap interfaces used for marshaling look 3rd-party-implement-able to me (though they're somewhat large / it'd be a bit annoying). |
|
||
func TestErrorWithDetails(t *testing.T) { | ||
sb := &strings.Builder{} | ||
zapLogger := zap.New(zapcore.NewCore(zapcore.NewJSONEncoder(zapcore.EncoderConfig{MessageKey: "msg"}), zapcore.AddSync(sb), zap.DebugLevel)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a zaptest
package that kinda covers this: https://pkg.go.dev/go.uber.org/zap@v1.21.0/zaptest
... but tbh it's sorta annoying to use, roughly as verbose as you have here or maybe a bit worse. And in this case we do actually care about the json-encoded form, so I think this is fine 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna go with "yea, LGTM". It moves us in a better direction regardless.
Two things then, I guess:
- Description and probably commit message are no longer accurate, so just keep an eye on it when you merge
- Maybe go back to
[]Tag
or similar, but with the error-details structure? I think it's fine either way though. It's not hard to change.- from personal experience: we'll probably add metric tags too, so might be worth naming the method
LogTags
. but again, easy to change later.
- from personal experience: we'll probably add metric tags too, so might be worth naming the method
What changed?
Added ability for logged values to define
Tags() []tag.Tag
method that may define additional tags to be emitted when logging the value.One use case - error types may define
Tags()
to emit additional tags for error fields. For example:Logging this error as:
Will end up in the following log entry emitted:
{"level":"error", "msg":"oh no", "error":"custom error", "error-wf-id":"workflow123", ...}
Why?
Logging errors with high cardinality fields (such as WorkflowId, RunId, etc) embedded into the error message, makes it hard to analyze logs. Such messages can not be grouped by cause or easily filtered.
Emitting error fields as separate tags will allow easier grouping & filtering of emitted logs.
How did you test it?
Potential risks
This PR also modifies previously custom formatted error messages, leaving them simpler. This has a potential to break logic for someone if they are trying to parse/match error string exactly.
Release notes
Documentation Changes