-
Notifications
You must be signed in to change notification settings - Fork 172
feat: add specific request tracing hooks #1816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add specific request tracing hooks #1816
Conversation
Router image scan passed✅ No security vulnerabilities found in image:
|
097c904
to
79e48b8
Compare
eb9d0f0
to
24c4182
Compare
@@ -44,7 +45,7 @@ func NewExpressionLogField(val any, key string, defaultValue any) zap.Field { | |||
val = &assertVal | |||
} | |||
|
|||
if v := val; v != "" { | |||
if v := val; v != "" && v != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we validate no explicit nil can be returned, on case we use the optional chaining operator we can still get nil, which will result in no default value being set.
(This could be considered a bit of a breaking change, if nil is returned somehow we don't set the default value right now.)
Because we also introduce custom telemetry attributes, we don't have a way right now to allow custom attributes in studio. We will need to (initial thoughts at least) add a UI in studio which can allow users to specify what custom telemetry attributes they want to view in studio traces. (cc: @StarpTech) |
def1303
to
dbe2f0b
Compare
router/internal/expr/converters.go
Outdated
import "github.com/wundergraph/cosmo/router/internal/httpclient" | ||
|
||
// ConvertToExprTrace converts an OperationTrace to an expr.ClientTrace | ||
func ConvertToExprTrace(trace *httpclient.ClientTrace) *ClientTrace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a separate (but similar) structure for Expressions, which we need to transform to from the ClientTrace structure.
|
||
dialStartMu.Lock() | ||
defer dialStartMu.Unlock() | ||
eC.DialStart = append(eC.DialStart, start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my testing I verified that multiple connection starts and done's can occur, I believe its important to record them all. Since we don't want a data race I use a lock.
// expressionsWithAuth is a map of expressions that can be used to resolve dynamic attributes and acces the auth | ||
// argument | ||
expressionsWithAuth map[string]*vm.Program | ||
} | ||
|
||
type VisitorCheckForRequestAuthAccess struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved this to the expr package where other visitors are stored.
metricsEnabled: opts.metricsEnabled, | ||
traceEnabled: opts.traceEnabled, | ||
mapper: opts.mapper, | ||
traceAttributeExpressions: opts.telemetryAttributeExpressions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a naming confusion here, which made telemetry -> trace, but applied metric attrs, thus renamed it to telemetry.
- "localhost:9092" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to revert
Motivation and Context
Sometimes subgraph requests can fail for users, and it might not be obvious why, by providing a way to log and add more span details, this can help users to figure out what is actually happening at the request level.
In order to do this the PR introduces the following
Add a traceclient wrapper: We create a custom roundtripper that uses the http.ClientTrace wrapper. This wrapper stores information from its hook into the request's context, which can be accessed outside the wrapper. (This is not tied to expressions and uses its own separate structs, but as an optimization we only wrap the roundTripper if
subgraph.request.clientTrace
used in any expression.)Add subgraph to the expression context: This holds information about the currently executing subgraph. If
clientTrace
is used in an expression, we convert the struct used by thetraceclient
to the expression struct.Enable expressions for subgraphs: At the moment we only allow defining expressions on access logs at the router level, this PR adds the capability to use expressions for access logs at the subgraph level.
Add
tracingAttributes
: Right now we havetelemetryAttributes
andmetricAttributes
, if a user definestelemetryAttributes
we will attach all values to every span in the request AS WELL as metrics. We addtracingAttributes
, which does the same but only attaches attributes to spans.Evaluation of expressions that use
subgraph.*
, any expression that usessubgraph
is evaluated in theengine_hooks_loader
, becausesubgraph.*
is empty outside of this. In order to do this we separate out expressions during the compilation step for bothtelemetryAttributes
andtracingAttributes
if they only should run in thesubgraph
. This prevents expressions focused on subgraphs being attached to unrelated spans.Caveats
All expected output values for expressions are strings, which means if the output value is time, int, bool, the user will need to cast it to string. This can be done using the
string
function in exprlang. This follows the current behaviour oftelemetryAttributes
, which only allows strings.Users should be aware of nils, if a hook wasnt executed for some reason the equivalent expr struct field would be nil.
Figuring out issues using this PR
Since the goal of this PR is to help users find out subgraph timeout, not connecting, etc issues, they can use the following to figure out (note that non string expressions need to be wrapped with string, but we will be ignoring using string to make it easy to read)
traceclient
logged. For dialStart and dialDone, users will need to just log the sliceThis is going to litter the subgraph
Engine - Fetch
span with information, if users are interested in Spans which error out, users can add the followingsubgraph.request.error != nil ? subgraph.request.clientTrace.connCreate?.time : nil
Users can get more specific information, if they are interested in duration used to aquire a connection, users could
subgraph.request.clientTrace.connAcquired.time - subgraph.request.clientTrace.connCreate.time
Use the
GetGroupedDials()
helper, to get Start and Done dials and process durationsMore examples can be found at: https://github.com/wundergraph/cosmo/pull/1816/files#diff-a76256094262e1d7bf2857b5e8be79e45bcfcd298f918a42fb8db6e267e4c984R178-R179
ExprStructure
Example Configuration for access logs (we only added expressions)
Example Configuration for tracing (We added tracing.attributes)
Checklist