Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log error fields as tags #4801

Merged
merged 7 commits into from
May 11, 2022
Merged

Log error fields as tags #4801

merged 7 commits into from
May 11, 2022

Conversation

vytautas-karpavicius
Copy link
Contributor

What changed?
Added ability for logged values to define Tags() []tag.Tag method that may define additional tags to be emitted when logging the value.

One use case - error types may define Tags() to emit additional tags for error fields. For example:

type MyCustomError struct {
  WorkflowId string
}
func (e MyCustomError) Error() string {
  return "custom error"
}
func (e MyCustomError) Tags() []tag.Tag {
  return []tag.Tag{tag.WorkflowID(e.WorkflowId)}
}

Logging this error as:

err := &MyCustomError{"workflow123"}
logger.Error("oh no", tag.Error(err))

Will end up in the following log entry emitted:
{"level":"error", "msg":"oh no", "error":"custom error", "error-wf-id":"workflow123", ...}

Why?
Logging errors with high cardinality fields (such as WorkflowId, RunId, etc) embedded into the error message, makes it hard to analyze logs. Such messages can not be grouped by cause or easily filtered.

Emitting error fields as separate tags will allow easier grouping & filtering of emitted logs.

How did you test it?

  • Added new unit test for logger.
  • Tested on our staging environment, to check whether new fields appear.

Potential risks
This PR also modifies previously custom formatted error messages, leaving them simpler. This has a potential to break logic for someone if they are trying to parse/match error string exactly.

Release notes

Documentation Changes

@vytautas-karpavicius vytautas-karpavicius requested a review from a team April 20, 2022 13:51
@vytautas-karpavicius vytautas-karpavicius marked this pull request as ready for review April 20, 2022 13:51
@coveralls
Copy link

coveralls commented Apr 20, 2022

Pull Request Test Coverage Report for Build a7eb836d-4cca-43a2-b4da-46cbf9e55243

  • 12 of 63 (19.05%) changed or added relevant lines in 2 files are covered.
  • 41 unchanged lines in 10 files lost coverage.
  • Overall coverage increased (+0.02%) to 56.985%

Changes Missing Coverage Covered Lines Changed/Added Lines %
common/types/errors.go 8 59 13.56%
Files with Coverage Reduction New Missed Lines %
common/membership/hashring.go 2 83.54%
common/persistence/historyManager.go 2 66.67%
common/task/fifoTaskScheduler.go 2 87.63%
common/task/weightedRoundRobinTaskScheduler.go 2 88.6%
service/history/queue/timer_queue_processor.go 2 58.37%
service/history/queue/transfer_queue_processor.go 2 56.86%
service/matching/matcher.go 2 91.46%
service/history/task/fetcher.go 4 86.15%
common/types/errors.go 7 19.81%
common/persistence/sql/sqlExecutionStore.go 16 62.3%
Totals Coverage Status
Change from base Build 756ac7b5-2fee-4757-927a-eab13f808640: 0.02%
Covered Lines: 84030
Relevant Lines: 147459

💛 - Coveralls

Copy link
Contributor

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without an accompanying "add error type info when logging" + "add error type when serializing for rpc response", kinda seems like this risks making our errors a far bit less useful, particularly for RPC purposes. e.g. we don't currently have responses like BadRequestError{ BadRequestError{Message:...} }, so wouldn't this lead to type info being lost in the string, which is used quite a few places as the only visible value? particularly when it isn't (or isn't correctly) converted to a custom RPC error type.

in principle I like the log-tag idea, but this seems kinda questionable at the moment.
and the next phase of this would probably be to add a "tag this error with" error-wrapper-type... but that can come later after other 1.13-error changes and it'll be easy to convert, no need to delay current improvements.


separately, if we're interested in moving to something like

{
  "error": "stuff",
  "error-details": {
    "key": "value",
    "...": ...
  }
}

instead of this PR's

{
  "error": "stuff",
  "error-key": value,
  "error-...": ...,
}

there are ways to get zap to build ad-hoc nesting. I can pull in some code I've written before to do exactly that.

common/log/loggerimpl/logger.go Outdated Show resolved Hide resolved
@vytautas-karpavicius
Copy link
Contributor Author

Without an accompanying "add error type info when logging" + "add error type when serializing for rpc response", kinda seems like this risks making our errors a far bit less useful, particularly for RPC purposes. e.g. we don't currently have responses like BadRequestError{ BadRequestError{Message:...} }, so wouldn't this lead to type info being lost in the string, which is used quite a few places as the only visible value? particularly when it isn't (or isn't correctly) converted to a custom RPC error type.

At the RPC boundary we map these types to either Thrift or Proto types. And that is done based on types themselves (not the error message string). See here and here. So clients using IDL types, will not see this change anyway.

One place where it is would be visible is our CLI, as it also uses these types. So error messages printed via CLI would drop the type info. Not sure if that type is super useful for the user TBH, the message itself usually the interesting bit.

there are ways to get zap to build ad-hoc nesting. I can pull in some code I've written before to do exactly that.

Nested approach might also work. Do you whether it is possible for messages to be grouped by a subfield of the nested field? I mainly went this direction to be sure that filter/group on these fields, but if it is possible to do with nested structures, they could work as well.

@Groxx
Copy link
Contributor

Groxx commented May 7, 2022

Sorry about the delay :|

Nested approach might also work. Do you whether it is possible for messages to be grouped by a subfield of the nested field? I mainly went this direction to be sure that filter/group on these fields, but if it is possible to do with nested structures, they could work as well.

Not entirely sure what you mean, but the thing I've built before lets you write

log.Info("stuff",
  nested("x",
    zap.String("yay", "strings"),
    zap.Int("and", 1138)))

and get

{
  "msg":"stuff",
  "x":{
    "yay":"strings",
    "and":1138
  }
}

and it works for building structures of arbitrary depth. It just reports itself as a zap-object-marshaler with whatever you pass as sub-fields.

tbh it's probably something zap should provide, the "namespace" thing is strange and doesn't do what you'd expect, and there's kinda no way to build structure.... in a structured logging library. I may make a PR for them at some point.

@vytautas-karpavicius
Copy link
Contributor Author

vytautas-karpavicius commented May 9, 2022

Sorry about the delay :|

Nested approach might also work. Do you whether it is possible for messages to be grouped by a subfield of the nested field? I mainly went this direction to be sure that filter/group on these fields, but if it is possible to do with nested structures, they could work as well.

Not entirely sure what you mean, but the thing I've built before lets you write

log.Info("stuff",
  nested("x",
    zap.String("yay", "strings"),
    zap.Int("and", 1138)))

and get

{
  "msg":"stuff",
  "x":{
    "yay":"strings",
    "and":1138
  }
}

and it works for building structures of arbitrary depth. It just reports itself as a zap-object-marshaler with whatever you pass as sub-fields.

tbh it's probably something zap should provide, the "namespace" thing is strange and doesn't do what you'd expect, and there's kinda no way to build structure.... in a structured logging library. I may make a PR for them at some point.

Ok, I have tested it and it looks like it would work just fine. Nested structures are flattened in our monitoring tools like so:

"error-details": {
  "key": "value"
}

appears as:

"error-details.key":"value"

You can filter on it, as well as group by it. So it is all good. Yay!

Update the code. This approach is also look cleaner.

However we need to expose zap internals outside our logging package. In particular zapcore.ObjectEncoder. Not sure if this is ok. In general I'm bit skeptical whether we need our logging abstraction on top at all. Unless we expect to use different logging library underneath, this just seem redundant to me.

@Groxx
Copy link
Contributor

Groxx commented May 9, 2022

However we need to expose zap internals outside our logging package. In particular zapcore.ObjectEncoder. Not sure if this is ok. In general I'm bit skeptical whether we need our logging abstraction on top at all. Unless we expect to use different logging library underneath, this just seem redundant to me.

Yeah, tbh I think we should consider just getting rid of it. Zap's core is very hook-able and efficient, it's not difficult to convert it to [pretty much literally anything].

Though for this PR's purposes: a custom logger impl doesn't need to respect these error fields. And the zap interfaces used for marshaling look 3rd-party-implement-able to me (though they're somewhat large / it'd be a bit annoying).
I'm inclined to say "meh, fine for now, discuss" and we can try to decide something before the next release. The []Tag approach should work too, if we don't want to move towards requiring zap.


func TestErrorWithDetails(t *testing.T) {
sb := &strings.Builder{}
zapLogger := zap.New(zapcore.NewCore(zapcore.NewJSONEncoder(zapcore.EncoderConfig{MessageKey: "msg"}), zapcore.AddSync(sb), zap.DebugLevel))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a zaptest package that kinda covers this: https://pkg.go.dev/go.uber.org/zap@v1.21.0/zaptest

... but tbh it's sorta annoying to use, roughly as verbose as you have here or maybe a bit worse. And in this case we do actually care about the json-encoded form, so I think this is fine 👍

Copy link
Contributor

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna go with "yea, LGTM". It moves us in a better direction regardless.

Two things then, I guess:

  • Description and probably commit message are no longer accurate, so just keep an eye on it when you merge
  • Maybe go back to []Tag or similar, but with the error-details structure? I think it's fine either way though. It's not hard to change.
    • from personal experience: we'll probably add metric tags too, so might be worth naming the method LogTags. but again, easy to change later.

@vytautas-karpavicius vytautas-karpavicius merged commit a51b613 into master May 11, 2022
@vytautas-karpavicius vytautas-karpavicius deleted the taggable-errors branch May 11, 2022 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants