Skip to content

Conversation

@samritchie
Copy link
Collaborator

  • Added tracing instrumentation on TableContext methods from ActivitySource "FSharp.AWS.DynamoDB". This is roughly in line with the published semantic conventions
  • Added a dependency on System.Diagnostics.DiagnosticSource - have seen reports that this won’t build on some netstandard2.0 platforms (Discussion: netstandard2.0 packages don't compile for out-of-support runtimes open-telemetry/opentelemetry-dotnet#3448). I’ve specified 6.0.0 which should work but may cause version conflicts(?)
  • Added a dependency on System.Text.Json, purely for (the questionable utility of) tagging consumed capacity as an array of JSON strings as recommended. I’m unsure how to properly add this for netstandard libs.
  • I’ve not attempted to log exceptions from within the TableContext, just the AWS client calls.
  • No Metric support - the only finalised DB metric is db.client.operation.duration which will need a stopwatch around all the client calls. Recording consumed capacity as a metric is probably the more useful value.
  • Started work on adding recorded activities to the test TableFixture but this needs actual assertions.

@samritchie samritchie requested a review from bartelink December 28, 2024 09:18
@@ -0,0 +1,70 @@
namespace FSharp.AWS.DynamoDB

module Diagnostics =
Copy link
Member

@bartelink bartelink Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let! ct = Async.CancellationToken
let! response = client.UpdateTableAsync(request, ct) |> Async.AwaitTaskCorrect
use activity = Diagnostics.startTableActivity request.TableName "UpdateTable" []
let! response = client.UpdateTableAsync(request, ct) |> Task.addActivityException activity |> Async.AwaitTaskCorrect
Copy link
Member

@bartelink bartelink Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awaittaskcorrect will do similar unwrapping to what your helper does internally?

Hm, no idea what to do with that fact though - I guess. I'm more wondering why FsAwsDdb bears responsblity for trapping and logging exceptions. Is there some reason that this is falls on this layer (note I have not done one of these integrations personally and no useful research)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most applications will be adding the exception to the parent span as well, but it would usually be added to the span where it originated - the application may catch & do something else, but that doesn’t change that this span failed. We certainly want to intercept & set activity status to Error at the very least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough I guess. @nordfjord should Equinox.MessagDb etc be following suit if so?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be catching exceptions and adding them to the spans in messagedb. But I also have been procrastinating homogenising the tracing between the js and f# versions

let! ct = Async.CancellationToken
let! response = client.TransactWriteItemsAsync(req, ct) |> Async.AwaitTaskCorrect
if response.HttpStatusCode <> HttpStatusCode.OK then
failwithf "TransactWriteItems request returned error %O" response.HttpStatusCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant fail this early obv

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not 100% sure this will ever be hit - I’ll have to dig into the SDK code and see if I can work out what it does. Even if it did return a response on a 500 or 503 I’d expect the entire response object would be invalid & we’d get an NRE trying to access ConsumedCapacity

|> Task.addActivityException activity
|> Async.AwaitTaskCorrect
if response.HttpStatusCode <> HttpStatusCode.OK then
failwithf "BatchWriteItem deletion request returned error %O" response.HttpStatusCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if failed, want trace


let addActivityCapacity (capacity: ConsumedCapacity seq) (activity: Activity) =
if notNull activity then
let value = capacity |> Seq.choose (function | null -> None | c -> Some (JsonSerializer.Serialize c)) |> Seq.toArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fels wrong; surely there's a generic dictionary shape or something? or maybe it can be flattened into an entry per table, or... anything?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t like it either, but the semantic conventions specifically define it like this. It would make more sense to me to use dynamic attribute keys - aws.dynamodb.consumed_capacity.{index_name}.write_capacity etc - other db traces use this for query params & the like. I can’t imagine many observability platforms go to the trouble of parsing JSON DynamoDB consumed capacity and doing anything useful with it.

I’m just as happy to leave it out, and maybe add metrics instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm - I think iterative tweaks make sense for this sort of work anyway - no subsittute for using it IRL to make you realise what works and doesnt

@samritchie
Copy link
Collaborator Author

A couple of updates:

  • Refined the Activity.add* functions - initially tried using the extension method approach but new instance creation etc made this awkward. Requires ignore of which I know @bartelink is not a fan, the alternative would be piping an ActivityConfig type and terminating with Activity.start, Activity.add, Activity.stop.
  • Removed the response.HttpStatusCode <> HttpStatusCode.OK checks. I don’t know why these were originally added but as far as I can tell the SDK will always throw on >= 400
  • Removed the aws.dynamodb.consumed_capacity JSON attribute from the activity (and the System.Text.Json dependency). It’s part of the spec, and boto does it, but I struggle to see how it would be useful/usable.
  • Added db.client.operation.consumed_write_capacity/db.client.operation.consumed_read_capacity histogram metrics - to me this makes more sense than burying as structured data in the span attributes. The goal would be for these to replace the metricsCollector callback.
  • Downgraded ReturnConsumedCapacity to TOTAL (rather than INDEXES) - index-level capacity was not being used anyway. It might be worth considering optionally enabling this (via an env var?) to allow reporting capacity metrics per index.

I’ve finally set up a private nuget so I can dogfood this; spans are working well but the metrics aren’t - unsure if I’ve missed something basic. I’m working on unit testing the diagnostics to at least see if it’s firing there.

@bartelink
Copy link
Member

bartelink commented Dec 17, 2025

In light of #84 and the integrated support that the V4 SDK provides, does this still make sense?

i.e. shouldn't dedicated activities only be provided where there's not a 1:1 correspondence between a given FsAwsDdb operation and the underlying SDK operation? The win would be that we don't end up having to perpetually maintain/sync with that impl?

What may be more useful is to provide an integrated 'batteries included' mechanism for gathering and emitting stats in the context of e.g. FSI usage?

Not really looking for to/fro here - probably best for any response to be as an Issue or a section in README.md that outlines the high level approach and intention? i.e. if this is polyfilling gaps in the SDK, then we should have a table detailing where that's been necessary (ideally with the edium term goal of having those issues solved upstream?)

@samritchie
Copy link
Collaborator Author

samritchie commented Dec 17, 2025

@bartelink Thanks, I’ll investigate - I don’t think we need to duplicate observability. Documentation is thin but the code doesn’t look it adds many of the Otel db semantic attributes; I’ll just need to assess whether it’s good enough.

Unsure why I never managed to release this one, I’ve been dogfooding & it works well, I think it just dropped off the radar.

@bartelink
Copy link
Member

👍 I'm only getting into any level of otel research now; am likely to apply it to Equinox, but said I'd start small by seeing how it applies here.

Definitely not against this library polyfilling things if that's the right thing to do - main thing is not to duplicate anything the SDK already addresses. I don't have a prod system/day to day experience of otel with the SDK so please don't assume I have any deep insight of any kind at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants