System.Diagnostics.ActivitySource support for observability #83

samritchie · 2024-12-28T09:18:04Z

Added tracing instrumentation on TableContext methods from ActivitySource "FSharp.AWS.DynamoDB". This is roughly in line with the published semantic conventions
Added a dependency on System.Diagnostics.DiagnosticSource - have seen reports that this won’t build on some netstandard2.0 platforms (Discussion: netstandard2.0 packages don't compile for out-of-support runtimes open-telemetry/opentelemetry-dotnet#3448). I’ve specified 6.0.0 which should work but may cause version conflicts(?)
Added a dependency on System.Text.Json, purely for (the questionable utility of) tagging consumed capacity as an array of JSON strings as recommended. I’m unsure how to properly add this for netstandard libs.
I’ve not attempted to log exceptions from within the TableContext, just the AWS client calls.
No Metric support - the only finalised DB metric is db.client.operation.duration which will need a stopwatch around all the client calls. Recording consumed capacity as a metric is probably the more useful value.
Started work on adding recorded activities to the test TableFixture but this needs actual assertions.

bartelink · 2024-12-28T09:30:39Z

src/FSharp.AWS.DynamoDB/Diagnostics.fs

@@ -0,0 +1,70 @@
+namespace FSharp.AWS.DynamoDB
+
+module Diagnostics =


see also https://github.com/jet/equinox/blob/master/src/Equinox/Tracing.fs https://github.com/jet/equinox/blob/master/src/Equinox.MessageDb/Tracing.fs for other helper layout ideas

e.g. see how https://github.com/jet/equinox/blob/master/src/Equinox.MessageDb/MessageDb.fs#L251-L252 puts logic more in your face while still staying relatvely compact

bartelink · 2024-12-28T09:36:05Z

src/FSharp.AWS.DynamoDB/TableContext.fs

        let! ct = Async.CancellationToken
-        let! response = client.UpdateTableAsync(request, ct) |> Async.AwaitTaskCorrect
+        use activity = Diagnostics.startTableActivity request.TableName "UpdateTable" []
+        let! response = client.UpdateTableAsync(request, ct) |> Task.addActivityException activity |> Async.AwaitTaskCorrect


awaittaskcorrect will do similar unwrapping to what your helper does internally?

Hm, no idea what to do with that fact though - I guess. I'm more wondering why FsAwsDdb bears responsblity for trapping and logging exceptions. Is there some reason that this is falls on this layer (note I have not done one of these integrations personally and no useful research)

Most applications will be adding the exception to the parent span as well, but it would usually be added to the span where it originated - the application may catch & do something else, but that doesn’t change that this span failed. We certainly want to intercept & set activity status to Error at the very least.

fair enough I guess. @nordfjord should Equinox.MessagDb etc be following suit if so?

We should be catching exceptions and adding them to the spans in messagedb. But I also have been procrastinating homogenising the tracing between the js and f# versions

bartelink · 2024-12-28T09:55:39Z

src/FSharp.AWS.DynamoDB/TableContext.fs

        let! ct = Async.CancellationToken
        let! response = client.TransactWriteItemsAsync(req, ct) |> Async.AwaitTaskCorrect
+        if response.HttpStatusCode <> HttpStatusCode.OK then
+            failwithf "TransactWriteItems request returned error %O" response.HttpStatusCode


cant fail this early obv

I’m not 100% sure this will ever be hit - I’ll have to dig into the SDK code and see if I can work out what it does. Even if it did return a response on a 500 or 503 I’d expect the entire response object would be invalid & we’d get an NRE trying to access ConsumedCapacity

bartelink · 2024-12-28T09:58:46Z

src/FSharp.AWS.DynamoDB/TableContext.fs

+            |> Task.addActivityException activity
+            |> Async.AwaitTaskCorrect
+        if response.HttpStatusCode <> HttpStatusCode.OK then
+            failwithf "BatchWriteItem deletion request returned error %O" response.HttpStatusCode


even if failed, want trace

bartelink · 2024-12-28T10:03:26Z

src/FSharp.AWS.DynamoDB/Diagnostics.fs

+
+    let addActivityCapacity (capacity: ConsumedCapacity seq) (activity: Activity) =
+        if notNull activity then
+            let value = capacity |> Seq.choose (function | null -> None | c -> Some (JsonSerializer.Serialize c)) |> Seq.toArray


this fels wrong; surely there's a generic dictionary shape or something? or maybe it can be flattened into an entry per table, or... anything?!

I don’t like it either, but the semantic conventions specifically define it like this. It would make more sense to me to use dynamic attribute keys - aws.dynamodb.consumed_capacity.{index_name}.write_capacity etc - other db traces use this for query params & the like. I can’t imagine many observability platforms go to the trouble of parsing JSON DynamoDB consumed capacity and doing anything useful with it.

I’m just as happy to leave it out, and maybe add metrics instead.

sgtm - I think iterative tweaks make sense for this sort of work anyway - no subsittute for using it IRL to make you realise what works and doesnt

samritchie · 2025-04-22T12:49:38Z

A couple of updates:

Refined the Activity.add* functions - initially tried using the extension method approach but new instance creation etc made this awkward. Requires ignore of which I know @bartelink is not a fan, the alternative would be piping an ActivityConfig type and terminating with Activity.start, Activity.add, Activity.stop.
Removed the response.HttpStatusCode <> HttpStatusCode.OK checks. I don’t know why these were originally added but as far as I can tell the SDK will always throw on >= 400
Removed the aws.dynamodb.consumed_capacity JSON attribute from the activity (and the System.Text.Json dependency). It’s part of the spec, and boto does it, but I struggle to see how it would be useful/usable.
Added db.client.operation.consumed_write_capacity/db.client.operation.consumed_read_capacity histogram metrics - to me this makes more sense than burying as structured data in the span attributes. The goal would be for these to replace the metricsCollector callback.
Downgraded ReturnConsumedCapacity to TOTAL (rather than INDEXES) - index-level capacity was not being used anyway. It might be worth considering optionally enabling this (via an env var?) to allow reporting capacity metrics per index.

I’ve finally set up a private nuget so I can dogfood this; spans are working well but the metrics aren’t - unsure if I’ve missed something basic. I’m working on unit testing the diagnostics to at least see if it’s firing there.

bartelink · 2025-12-17T17:38:39Z

In light of #84 and the integrated support that the V4 SDK provides, does this still make sense?

i.e. shouldn't dedicated activities only be provided where there's not a 1:1 correspondence between a given FsAwsDdb operation and the underlying SDK operation? The win would be that we don't end up having to perpetually maintain/sync with that impl?

What may be more useful is to provide an integrated 'batteries included' mechanism for gathering and emitting stats in the context of e.g. FSI usage?

Not really looking for to/fro here - probably best for any response to be as an Issue or a section in README.md that outlines the high level approach and intention? i.e. if this is polyfilling gaps in the SDK, then we should have a table detailing where that's been necessary (ideally with the edium term goal of having those issues solved upstream?)

samritchie · 2025-12-17T23:33:30Z

@bartelink Thanks, I’ll investigate - I don’t think we need to duplicate observability. Documentation is thin but the code doesn’t look it adds many of the Otel db semantic attributes; I’ll just need to assess whether it’s good enough.

Unsure why I never managed to release this one, I’ve been dogfooding & it works well, I think it just dropped off the radar.

bartelink · 2025-12-17T23:55:56Z

👍 I'm only getting into any level of otel research now; am likely to apply it to Equinox, but said I'd start small by seeing how it applies here.

Definitely not against this library polyfilling things if that's the right thing to do - main thing is not to duplicate anything the SDK already addresses. I don't have a prod system/day to day experience of otel with the SDK so please don't assume I have any deep insight of any kind at this point.

samritchie added 3 commits December 17, 2024 08:06

wip diagnostics

5267a1b

Merge

c6b355b

Fleshed out ActivitySource instrumentation

a6ee289

samritchie requested a review from bartelink December 28, 2024 09:18

bartelink reviewed Dec 28, 2024

View reviewed changes

Modified OpenTelemetry, removed HTTP Status Code check, pinned FsCheck

7433b74

bartelink mentioned this pull request Dec 17, 2025

Target AWSSDK.DynamoDBv2 v4 #86

Merged

		@@ -0,0 +1,70 @@
		namespace FSharp.AWS.DynamoDB

		module Diagnostics =

System.Diagnostics.ActivitySource support for observability #83

Are you sure you want to change the base?

System.Diagnostics.ActivitySource support for observability #83

Uh oh!

Conversation

samritchie commented Dec 28, 2024

Uh oh!

bartelink Dec 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bartelink Dec 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samritchie commented Apr 22, 2025

Uh oh!

bartelink commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samritchie commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartelink commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bartelink Dec 28, 2024 •

edited

Loading

bartelink Dec 28, 2024 •

edited

Loading

bartelink commented Dec 17, 2025 •

edited

Loading

samritchie commented Dec 17, 2025 •

edited

Loading