Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement structured logging for gRPC errors on SPIRE agent #4262

Merged

Conversation

Neniel
Copy link
Contributor

@Neniel Neniel commented Jun 15, 2023

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality

SPIRE agent's client

Description of change

This pull request is intended to improve logging of gRPC errors from the SPIRE agent by separating code and message.

Which issue this PR fixes

fixes #2613

@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch from 9ca9bb6 to 4faa1fb Compare June 15, 2023 12:15
Copy link
Collaborator

@rturner3 rturner3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this, @Neniel! I had a couple small suggestions, but overall this is headed in the right direction.

pkg/agent/client/client.go Outdated Show resolved Hide resolved
pkg/agent/client/client.go Show resolved Hide resolved
@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch from 46a417f to 1bba9e2 Compare June 16, 2023 12:13
@Neniel Neniel marked this pull request as ready for review June 16, 2023 12:14
@@ -171,7 +171,14 @@ func (c *client) RenewSVID(ctx context.Context, csr []byte) (*X509SVID, error) {
})
if err != nil {
c.release(connection)
c.c.Log.WithError(err).Error("Failed to renew agent")
log := c.c.Log
if s, ok := status.FromError(err); ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a helper function to reduce code? (and use constants), something like:

func addErrorFields(log logrus.FieldLogger, err error) logrus.FieldLogger {
   if s, ok := status.FromError(err); ok {
        log = log.WithFields(logrus.Fields{                
		telemetry.StatusCode:    s.Code(),
		telemetry.StatusMessage: s.Message(),
	})
   }
   return log
}

and you can call

   log := addErrorFields(c.c.Log, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I asked Taylor in a previous comment but I'm not sure where to create that function. Is there a helper package or something similar that is suitable to hold this kind of stuff?

Copy link
Collaborator

@MarcosDY MarcosDY Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add as private on the same file, or do you think you need to use in on different packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it will only be used on the same package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 6c8d983, let me know in case there's more suggestions :)

@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch 2 times, most recently from 3f2d9e7 to 22ac0cb Compare June 19, 2023 14:21
pkg/agent/client/client.go Outdated Show resolved Hide resolved
c := &client{
c: tt.fields.c,
}
assert.Equalf(t, tt.want, c.addErrorFields(tt.args.err), "addErrorFields(%v)", tt.args.err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As addErrorFields() is a private method of client, I think it would be better to test that each public method in client produces the expected error logs when the RPCs those public methods call return errors. However, I acknowledge that would significantly increase the scope of this PR because there is no testing of log messages in this struct today. We do similar log assertions in other parts of the codebase, but it looks like it was missed in this package for some reason.

If you feel up for adding that test support, it would be much appreciated since there are currently uncovered branches, and it's a bit difficult to tell that the usages of this new method are working as expected. If not, I think it would be good to create a follow-up issue and link it as a comment somewhere in this file (maybe above each test that is covering error cases).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just started working on that. Please check 29961ed for the first tests that covers some cases. I'll be adding the remaining ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dismiss my previous comment, I need to re-write the tests due to the changes of the ex addErrorFields method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just updated the new tests at 7d46273

@@ -362,9 +362,9 @@ func (c *client) fetchBundles(ctx context.Context, federatedBundles []string) ([
case codes.OK:
bundles = append(bundles, bundle)
case codes.NotFound:
c.c.Log.WithError(err).WithField(telemetry.FederatedBundle, b).Warn("Federated bundle not found")
c.addErrorFields(err).WithField(telemetry.FederatedBundle, b).Warn("Federated bundle not found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my suggestion below, if we check for err == nil in this new method, we can call it once above this switch block, rather than for each error log in the switch cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be called once for each one of the federatedBundles that are being processed and use the logger that that method returns in each one of the cases, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if you declare the local logger with the error fields before the switch block within the for loop, it will only use that logger for the federated bundle currently being examined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at 55c88fa

@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch from d07053b to f305c2a Compare June 27, 2023 13:56
@@ -241,6 +243,12 @@ func TestRenewSVID(t *testing.T) {
agentErr: errors.New("renew fails"),
err: "failed to renew agent: renew fails",
},
{
name: "call to RenewAgent fails",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test that there is an error level log with the expected error fields?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done separately to address #4474.

@@ -344,6 +352,90 @@ func TestNewX509SVIDs(t *testing.T) {
assertConnectionIsNotNil(t, client)
}

func TestNewX509SVIDsFails(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code in this test is the same as in TestNewX509SVIDs. Can we re-use the existing setup logic in TestNewX509SVIDs and refactor it to be a table-driven test with this new test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at 5818be9

},
}

tc.svidClient.batchSVIDErr = status.Error(codes.NotFound, "not found when executing BatchNewX509SVID")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test that there is an error level log with the expected error fields?

Comment on lines 832 to 787
ExpiresAt: issuedAt,
IssuedAt: expiresAt,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields look mixed up, is it intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I forgot to change it after copying the previous test. Fixed at 7488427

@@ -696,6 +824,19 @@ func TestFetchJWTSVID(t *testing.T) {
},
err: "JWTSVID issued after it has expired",
},
{
name: "grpc call to NewJWTSVID fails",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test that there is an error level log with the expected error fields?

@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch from 55c88fa to d049445 Compare July 4, 2023 10:35
Comment on lines 476 to 478
logger := c.c.Log.WithError(err)
if s, ok := status.FromError(err); ok {
logger = c.c.Log.WithFields(logrus.Fields{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger variable is not used to add fields, and setted with a new one, so WithError is missed,
you must use logger.WithFields instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at 7b9fc68

@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch from c4c0f36 to a712258 Compare July 10, 2023 13:30
@rturner3
Copy link
Collaborator

Hey @Neniel, I see there's some commits from main that somehow got pulled into the branch supporting this PR. It looks like they were interleaved between the changes you're making, maybe there was a rebase that got messed up at some point?

In any case, the branch history for this PR can be cleaned up to just the changes you're proposing by running git rebase -i locally in your branch, and marking each commit unrelated to this PR as drop. Once the branch history includes only your commits on top of the main branch, doing a git push -f <remote> <branch> should clean things up here.

Also just as a suggestion for the future, running git merge main in the development branch is usually the best way to keep your development branch in sync with the latest. Additionally, doing a "force push" of development branches using git push -f makes it a bit harder for us as reviewers to see what has changed since the last time you pushed changes to the PR. Generally git push without -f is recommended (which is possible if you use git merge rather than git rebase to keep your development branch in sync).

@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch 3 times, most recently from 2dbcc2f to b21fbce Compare July 21, 2023 16:04
@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch from b21fbce to 539d3c7 Compare July 24, 2023 13:36
@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch 2 times, most recently from 50123cc to 3d466f7 Compare August 24, 2023 18:33
… for grpc errors on spire agent

Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Neniel and others added 10 commits August 24, 2023 15:34
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Co-authored-by: Ryan Turner <rturner3@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
Signed-off-by: Neniel <11655196+Neniel@users.noreply.github.com>
@Neniel Neniel force-pushed the structured-logging-for-grpc-errors-on-spire-agent branch from 3d466f7 to 407362a Compare August 24, 2023 18:35
{
name: "success",
entries: entries,
x509SVIDs: x509SVIDs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this is the same value in every subtest, we can remove x509SVIDs as a test template field

assert.Error(t, err)
assertConnectionIsNil(t, client)

expectedLog := log.WithFields(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would check that a log message was emitted that contained these fields rather than just checking if the logger had the fields added to it. We have methods like AssertLogs() in test/spiretest for this purpose. Since this kind of test coverage was not present in this package already, I'm ok if we defer this to a follow-up issue.

I've created #4474 to track the follow-up work.

@@ -241,6 +243,12 @@ func TestRenewSVID(t *testing.T) {
agentErr: errors.New("renew fails"),
err: "failed to renew agent: renew fails",
},
{
name: "call to RenewAgent fails",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done separately to address #4474.

@azdagron azdagron merged commit 0483126 into spiffe:main Sep 1, 2023
31 checks passed
@MarcosDY MarcosDY added this to the 1.8.0 milestone Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Structured logging of SPIRE Agent errors from gRPC requests to SPIRE Server
4 participants