-
Notifications
You must be signed in to change notification settings - Fork 18
System.Diagnostics.ActivitySource support for observability #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| @@ -0,0 +1,70 @@ | |||
| namespace FSharp.AWS.DynamoDB | |||
|
|
|||
| module Diagnostics = | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see also https://github.com/jet/equinox/blob/master/src/Equinox/Tracing.fs https://github.com/jet/equinox/blob/master/src/Equinox.MessageDb/Tracing.fs for other helper layout ideas
e.g. see how https://github.com/jet/equinox/blob/master/src/Equinox.MessageDb/MessageDb.fs#L251-L252 puts logic more in your face while still staying relatvely compact
| let! ct = Async.CancellationToken | ||
| let! response = client.UpdateTableAsync(request, ct) |> Async.AwaitTaskCorrect | ||
| use activity = Diagnostics.startTableActivity request.TableName "UpdateTable" [] | ||
| let! response = client.UpdateTableAsync(request, ct) |> Task.addActivityException activity |> Async.AwaitTaskCorrect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awaittaskcorrect will do similar unwrapping to what your helper does internally?
Hm, no idea what to do with that fact though - I guess. I'm more wondering why FsAwsDdb bears responsblity for trapping and logging exceptions. Is there some reason that this is falls on this layer (note I have not done one of these integrations personally and no useful research)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most applications will be adding the exception to the parent span as well, but it would usually be added to the span where it originated - the application may catch & do something else, but that doesn’t change that this span failed. We certainly want to intercept & set activity status to Error at the very least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough I guess. @nordfjord should Equinox.MessagDb etc be following suit if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be catching exceptions and adding them to the spans in messagedb. But I also have been procrastinating homogenising the tracing between the js and f# versions
| let! ct = Async.CancellationToken | ||
| let! response = client.TransactWriteItemsAsync(req, ct) |> Async.AwaitTaskCorrect | ||
| if response.HttpStatusCode <> HttpStatusCode.OK then | ||
| failwithf "TransactWriteItems request returned error %O" response.HttpStatusCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant fail this early obv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not 100% sure this will ever be hit - I’ll have to dig into the SDK code and see if I can work out what it does. Even if it did return a response on a 500 or 503 I’d expect the entire response object would be invalid & we’d get an NRE trying to access ConsumedCapacity
| |> Task.addActivityException activity | ||
| |> Async.AwaitTaskCorrect | ||
| if response.HttpStatusCode <> HttpStatusCode.OK then | ||
| failwithf "BatchWriteItem deletion request returned error %O" response.HttpStatusCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if failed, want trace
|
|
||
| let addActivityCapacity (capacity: ConsumedCapacity seq) (activity: Activity) = | ||
| if notNull activity then | ||
| let value = capacity |> Seq.choose (function | null -> None | c -> Some (JsonSerializer.Serialize c)) |> Seq.toArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fels wrong; surely there's a generic dictionary shape or something? or maybe it can be flattened into an entry per table, or... anything?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t like it either, but the semantic conventions specifically define it like this. It would make more sense to me to use dynamic attribute keys - aws.dynamodb.consumed_capacity.{index_name}.write_capacity etc - other db traces use this for query params & the like. I can’t imagine many observability platforms go to the trouble of parsing JSON DynamoDB consumed capacity and doing anything useful with it.
I’m just as happy to leave it out, and maybe add metrics instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm - I think iterative tweaks make sense for this sort of work anyway - no subsittute for using it IRL to make you realise what works and doesnt
|
A couple of updates:
I’ve finally set up a private nuget so I can dogfood this; spans are working well but the metrics aren’t - unsure if I’ve missed something basic. I’m working on unit testing the diagnostics to at least see if it’s firing there. |
|
In light of #84 and the integrated support that the V4 SDK provides, does this still make sense? i.e. shouldn't dedicated activities only be provided where there's not a 1:1 correspondence between a given FsAwsDdb operation and the underlying SDK operation? The win would be that we don't end up having to perpetually maintain/sync with that impl? What may be more useful is to provide an integrated 'batteries included' mechanism for gathering and emitting stats in the context of e.g. FSI usage? Not really looking for to/fro here - probably best for any response to be as an Issue or a section in README.md that outlines the high level approach and intention? i.e. if this is polyfilling gaps in the SDK, then we should have a table detailing where that's been necessary (ideally with the edium term goal of having those issues solved upstream?) |
|
@bartelink Thanks, I’ll investigate - I don’t think we need to duplicate observability. Documentation is thin but the code doesn’t look it adds many of the Otel db semantic attributes; I’ll just need to assess whether it’s good enough. Unsure why I never managed to release this one, I’ve been dogfooding & it works well, I think it just dropped off the radar. |
|
👍 I'm only getting into any level of otel research now; am likely to apply it to Equinox, but said I'd start small by seeing how it applies here. Definitely not against this library polyfilling things if that's the right thing to do - main thing is not to duplicate anything the SDK already addresses. I don't have a prod system/day to day experience of otel with the SDK so please don't assume I have any deep insight of any kind at this point. |
"FSharp.AWS.DynamoDB". This is roughly in line with the published semantic conventionsSystem.Diagnostics.DiagnosticSource- have seen reports that this won’t build on somenetstandard2.0platforms (Discussion: netstandard2.0 packages don't compile for out-of-support runtimes open-telemetry/opentelemetry-dotnet#3448). I’ve specified 6.0.0 which should work but may cause version conflicts(?)System.Text.Json, purely for (the questionable utility of) tagging consumed capacity as an array of JSON strings as recommended. I’m unsure how to properly add this for netstandard libs.db.client.operation.durationwhich will need a stopwatch around all the client calls. Recording consumed capacity as a metric is probably the more useful value.