generic metric hooks #149

joewilliams · 2020-07-28T17:44:46Z

Over in slack I asked about how we might go about adding the ability to emit metrics out of go-spiffe in a generic way. We've implemented our own wrapper for go-spiffe that wraps the go-spiffe functions like GetClientCertificate that end up in the tls.Config so we can track success and failures from the perspective of each end point.

After chatting about it with the team here at GH we think a implementation similar to the ClientTrace in net/http might be a reasonable solution. The basic idea is that there would be a struct users could configure with functions that would get executed at different points in the go-spiffe codebase. The example they use in the blog post is adding logging when a DNS response is received by the http client.

Some examples of metrics we think might be good points to execute hooks from:

getTLSCertificate - we should have access to cert details here and can emit spiffe IDs, etc similar to wrapGetCertificateWithMetrics
ParseAndVerify - we should have access to cert details here and can emit spiffe IDs, etc similar to wrapVerifyPeerCertificateWithMetrics, probably want success/failure counts and timers
Verify - along the lines above ☝️, probably want success/failure counts and timers
AdaptMatcher - emit metrics on authorization success and failure

There are likely many others that might make sense as well.

Anyway, I am curious what folks think and wanted to get some consensus before we dig into implementing anything.

The text was updated successfully, but these errors were encountered:

azdagron · 2020-07-28T18:21:16Z

I think that sounds like a reasonable approach! We have lots of different packages. Would each package define its own "ClientTrace"-style struct and associated options that allow you to set it? There will probably be a little design work to ensure consistency of application to the various components.

I'm assume you are just targeting the go-spiffe v2 implementation?

I also wonder if this is sufficient to supplant the logger, which has felt a little invasive and something a few of us debated quite a bit before including.

joewilliams · 2020-07-28T18:23:48Z

Would each package define its own "ClientTrace"-style struct and associated options that allow you to set it?

😅 I don' think we know yet but we'll dig in and find out.

I'm assume you are just targeting the go-spiffe v2 implementation?

Yes!

I also wonder if this is sufficient to supplant the logger, which has felt a little invasive and something a few of us debated quite a bit before including.

I think it certainly could replace it.

joewilliams · 2020-07-28T23:12:47Z

@azdagron I pushed #150 with PoC

joewilliams · 2020-08-04T18:02:54Z

@azdagron once we agree on an approach over in #150 it's unclear what might happen next. I think we'd be onboard to submit PR(s) for the areas/packages we use since we have a pretty good idea of what we'd want to measure with the metrics but I'm not sure we have the production knowledge to implement hooks in all the right places for everything else. Frankly, I don't think that would cover very many (tlsconfig, spiffeid, parts of workloadapi, etc). A half-implementation is likely worse than no implementation. What do you think?

azdagron · 2020-08-04T20:15:40Z

I'm sure once there is consensus that we can get somebody to work on it that is more familiar with the library. It should be a relatively scoped exercise and as long as we've done the plumbing of the trace structs correctly we don't need to worry about missing a start/end function here or there. Those should be able to be added later if needed.

joewilliams · 2020-09-08T20:23:05Z

@azdagron any further thoughts on the approach in #150 Other than addressing the commenters issues, where do we go from here?

azdagron · 2020-09-09T15:13:54Z

I apologize. It's been a busy few weeks and this off my radar. I was talking to @evan2645 about this a few weeks back and we discussed prior art in other libraries. I've done some digging without success to see if there are established patterns outside of what is provided by httptrace that we could align to. I haven't had much luck finding anything however. I think the current approach of providing a trace struct via options seems as good as any. It certainly gives us future proofing flexibility.

@amartinezfayo @evan2645 any concerns marching forward with the current approach?

amartinezfayo · 2020-09-09T15:20:35Z

any concerns marching forward with the current approach?

I don't personally have a specific concern with the current approach. I'm guessing that we may need to make tweaks here and there as people start using this, but hopefully without structural changes.

azdagron · 2020-09-09T15:22:50Z

Yeah, my biggest concern is that we settle with a design that is 1) internally consistent within the library, and 2) does not require breaking changes to expand

evan2645 · 2020-09-09T23:01:42Z

I haven't spent enough time looking at this change to endorse it specifically, but so long as we've considered existing patterns etc it sounds fine to me :)

azdagron · 2020-09-11T20:28:44Z

@joewilliams I think next steps is addressing the feedback on the PR and getting that in. I appreciate your patience seeing this through!

joewilliams · 2020-09-14T20:40:03Z

@azdagron i think we've addressed the reviewer comments in #150, any thing else we should look into before we squash and merge?

azdagron · 2020-09-14T23:47:55Z

We're really close. I took a final pass and found some things that I think need addressing but after those I think we're good to squash and merge. I appreciate your patience with the back-and-forth!

joewilliams · 2020-09-17T23:53:06Z

@azdagron I think the next thing we'd want to implement this tracing pattern for are the verify related functions in tlsconfig. Does the align with what you are wanting?

azdagron · 2020-09-18T16:02:05Z

Yeah. I started digging into a PR for that and it brought up some questions around ergonomics that I'm still debating internally...

azdagron · 2020-09-25T18:39:55Z

This is still on my radar but I haven't had cycles to put more thought into it. Just wanted to make sure folks knew it hasn't been forgotten.

azdagron · 2020-10-19T17:21:30Z

@joewilliams, I'm so sorry this has fallen by the wayside.

I think next steps is figuring out at what granularity we want on the verification related tracing hooks. Verification has two basic steps:

verifying the cryptographic chain of trust from the peer certificate back to the bundle
authorizing the peer certificate

Is there utility in tracing each of those events from a debuggability or telemetry perspective? I think the answer is yes?

If so, we'd probably be looking for an addition to the interface as follows:

type Trace struct {
    ...
    AuthenticatePeer   func(info AuthenticatePeerInfo) (traceCtx interface{})
    AuthenticatedPeer  func(traceCtx interface{}, info AuthenticatedPeerInfo)

    AuthorizePeer      func(info AuthorizePeerInfo) (traceCtx interface{})
    AuthorizedPeer     func(traceCtx interface{}, info AuthorizedPeerInfo)
}

type AuthenticatePeerInfo struct {
    PeerCertificate []*x509.Certificate
}

type AuthenticatedPeerInfo struct {
    VerifiedChains [][]*x509.Certificate
    Err            error
}

type AuthorizePeerInfo struct {
    ID             spiffeid.ID
    VerifiedChains [][]*x509.Certificate
}

type AuthorizedPeerInfo struct {
    ID  spiffeid.ID
    Err error
}

How does that feel?

joewilliams mentioned this issue Jul 28, 2020

tlsconfig trace implementation #150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generic metric hooks #149

generic metric hooks #149

joewilliams commented Jul 28, 2020

azdagron commented Jul 28, 2020 •

edited

Loading

joewilliams commented Jul 28, 2020

joewilliams commented Jul 28, 2020

joewilliams commented Aug 4, 2020

azdagron commented Aug 4, 2020

joewilliams commented Sep 8, 2020

azdagron commented Sep 9, 2020

amartinezfayo commented Sep 9, 2020

azdagron commented Sep 9, 2020

evan2645 commented Sep 9, 2020

azdagron commented Sep 11, 2020

joewilliams commented Sep 14, 2020

azdagron commented Sep 14, 2020

joewilliams commented Sep 17, 2020

azdagron commented Sep 18, 2020

azdagron commented Sep 25, 2020

azdagron commented Oct 19, 2020

generic metric hooks #149

generic metric hooks #149

Comments

joewilliams commented Jul 28, 2020

azdagron commented Jul 28, 2020 • edited Loading

joewilliams commented Jul 28, 2020

joewilliams commented Jul 28, 2020

joewilliams commented Aug 4, 2020

azdagron commented Aug 4, 2020

joewilliams commented Sep 8, 2020

azdagron commented Sep 9, 2020

amartinezfayo commented Sep 9, 2020

azdagron commented Sep 9, 2020

evan2645 commented Sep 9, 2020

azdagron commented Sep 11, 2020

joewilliams commented Sep 14, 2020

azdagron commented Sep 14, 2020

joewilliams commented Sep 17, 2020

azdagron commented Sep 18, 2020

azdagron commented Sep 25, 2020

azdagron commented Oct 19, 2020

azdagron commented Jul 28, 2020 •

edited

Loading