-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement structured logging for gRPC errors on SPIRE agent #4262
Implement structured logging for gRPC errors on SPIRE agent #4262
Conversation
9ca9bb6
to
4faa1fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending this, @Neniel! I had a couple small suggestions, but overall this is headed in the right direction.
46a417f
to
1bba9e2
Compare
pkg/agent/client/client.go
Outdated
@@ -171,7 +171,14 @@ func (c *client) RenewSVID(ctx context.Context, csr []byte) (*X509SVID, error) { | |||
}) | |||
if err != nil { | |||
c.release(connection) | |||
c.c.Log.WithError(err).Error("Failed to renew agent") | |||
log := c.c.Log | |||
if s, ok := status.FromError(err); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a helper function to reduce code? (and use constants), something like:
func addErrorFields(log logrus.FieldLogger, err error) logrus.FieldLogger {
if s, ok := status.FromError(err); ok {
log = log.WithFields(logrus.Fields{
telemetry.StatusCode: s.Code(),
telemetry.StatusMessage: s.Message(),
})
}
return log
}
and you can call
log := addErrorFields(c.c.Log, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I asked Taylor in a previous comment but I'm not sure where to create that function. Is there a helper package or something similar that is suitable to hold this kind of stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add as private on the same file, or do you think you need to use in on different packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now it will only be used on the same package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed on 6c8d983, let me know in case there's more suggestions :)
3f2d9e7
to
22ac0cb
Compare
pkg/agent/client/client_test.go
Outdated
c := &client{ | ||
c: tt.fields.c, | ||
} | ||
assert.Equalf(t, tt.want, c.addErrorFields(tt.args.err), "addErrorFields(%v)", tt.args.err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As addErrorFields()
is a private method of client
, I think it would be better to test that each public method in client
produces the expected error logs when the RPCs those public methods call return errors. However, I acknowledge that would significantly increase the scope of this PR because there is no testing of log messages in this struct today. We do similar log assertions in other parts of the codebase, but it looks like it was missed in this package for some reason.
If you feel up for adding that test support, it would be much appreciated since there are currently uncovered branches, and it's a bit difficult to tell that the usages of this new method are working as expected. If not, I think it would be good to create a follow-up issue and link it as a comment somewhere in this file (maybe above each test that is covering error cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just started working on that. Please check 29961ed for the first tests that covers some cases. I'll be adding the remaining ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dismiss my previous comment, I need to re-write the tests due to the changes of the ex addErrorFields
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just updated the new tests at 7d46273
pkg/agent/client/client.go
Outdated
@@ -362,9 +362,9 @@ func (c *client) fetchBundles(ctx context.Context, federatedBundles []string) ([ | |||
case codes.OK: | |||
bundles = append(bundles, bundle) | |||
case codes.NotFound: | |||
c.c.Log.WithError(err).WithField(telemetry.FederatedBundle, b).Warn("Federated bundle not found") | |||
c.addErrorFields(err).WithField(telemetry.FederatedBundle, b).Warn("Federated bundle not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my suggestion below, if we check for err == nil
in this new method, we can call it once above this switch
block, rather than for each error log in the switch cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be called once for each one of the federatedBundles
that are being processed and use the logger that that method returns in each one of the cases, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if you declare the local logger with the error fields before the switch block within the for loop, it will only use that logger for the federated bundle currently being examined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at 55c88fa
d07053b
to
f305c2a
Compare
@@ -241,6 +243,12 @@ func TestRenewSVID(t *testing.T) { | |||
agentErr: errors.New("renew fails"), | |||
err: "failed to renew agent: renew fails", | |||
}, | |||
{ | |||
name: "call to RenewAgent fails", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test that there is an error level log with the expected error fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be done separately to address #4474.
pkg/agent/client/client_test.go
Outdated
@@ -344,6 +352,90 @@ func TestNewX509SVIDs(t *testing.T) { | |||
assertConnectionIsNotNil(t, client) | |||
} | |||
|
|||
func TestNewX509SVIDsFails(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code in this test is the same as in TestNewX509SVIDs
. Can we re-use the existing setup logic in TestNewX509SVIDs
and refactor it to be a table-driven test with this new test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done at 5818be9
pkg/agent/client/client_test.go
Outdated
}, | ||
} | ||
|
||
tc.svidClient.batchSVIDErr = status.Error(codes.NotFound, "not found when executing BatchNewX509SVID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test that there is an error level log with the expected error fields?
pkg/agent/client/client_test.go
Outdated
ExpiresAt: issuedAt, | ||
IssuedAt: expiresAt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields look mixed up, is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I forgot to change it after copying the previous test. Fixed at 7488427
@@ -696,6 +824,19 @@ func TestFetchJWTSVID(t *testing.T) { | |||
}, | |||
err: "JWTSVID issued after it has expired", | |||
}, | |||
{ | |||
name: "grpc call to NewJWTSVID fails", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test that there is an error level log with the expected error fields?
55c88fa
to
d049445
Compare
pkg/agent/client/client.go
Outdated
logger := c.c.Log.WithError(err) | ||
if s, ok := status.FromError(err); ok { | ||
logger = c.c.Log.WithFields(logrus.Fields{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger variable is not used to add fields, and setted with a new one, so WithError
is missed,
you must use logger.WithFields instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at 7b9fc68
c4c0f36
to
a712258
Compare
Hey @Neniel, I see there's some commits from In any case, the branch history for this PR can be cleaned up to just the changes you're proposing by running Also just as a suggestion for the future, running |
2dbcc2f
to
b21fbce
Compare
b21fbce
to
539d3c7
Compare
50123cc
to
3d466f7
Compare
… for grpc errors on spire agent Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Co-authored-by: Ryan Turner <rturner3@users.noreply.github.com> Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
3d466f7
to
407362a
Compare
{ | ||
name: "success", | ||
entries: entries, | ||
x509SVIDs: x509SVIDs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since this is the same value in every subtest, we can remove x509SVIDs
as a test template field
assert.Error(t, err) | ||
assertConnectionIsNil(t, client) | ||
|
||
expectedLog := log.WithFields( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would check that a log message was emitted that contained these fields rather than just checking if the logger had the fields added to it. We have methods like AssertLogs()
in test/spiretest
for this purpose. Since this kind of test coverage was not present in this package already, I'm ok if we defer this to a follow-up issue.
I've created #4474 to track the follow-up work.
@@ -241,6 +243,12 @@ func TestRenewSVID(t *testing.T) { | |||
agentErr: errors.New("renew fails"), | |||
err: "failed to renew agent: renew fails", | |||
}, | |||
{ | |||
name: "call to RenewAgent fails", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be done separately to address #4474.
Pull Request check list
Affected functionality
SPIRE agent's client
Description of change
This pull request is intended to improve logging of gRPC errors from the SPIRE agent by separating code and message.
Which issue this PR fixes
fixes #2613