Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curate tracing spans #66

Merged
merged 10 commits into from
Oct 21, 2024
Merged

Curate tracing spans #66

merged 10 commits into from
Oct 21, 2024

Conversation

graebm
Copy link
Contributor

@graebm graebm commented Oct 18, 2024

Description of changes:

  • Each major operation has a "parent" span it assigns to the tasks it spawns. Otherwise, spans from spawned tasks have no parent, which makes them a disorganized mess in visualizations.
  • Ensure there's a span for each HTTP request, because these are often the bottleneck.
  • Ensure spans have important key/value pairs (e.g. seq, part_number, bucket, key), so when a bunch of spans with the same name are together, we tell what each one is up to.
  • debatable: gave names that stand on their own. For example: the function aws_s3_transfer_manager::operation::upload::builders::send() is named "initiate-download", instead of just "send", so a visualization doesn't need to show the full (very very wide) namespace to explain what's going on.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

… namespace to know what they're about. Is this a good idea?

- better formatting for long #[instrument(...)] lines
- .unwrap_or_default() instead of .unwrap_or("")
- .bucket() instead of .bucket.as_deref() wherever possible
@@ -44,6 +45,7 @@ impl Download {
pub(crate) async fn orchestrate(
handle: Arc<crate::client::Handle>,
input: crate::operation::download::DownloadInput,
existing_span_for_tasks: Option<tracing::span::Id>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass this in? Span relationships should generally be automatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not automatic for spawned tasks, which start with no parent span. Previously, the "download-chunk" spans had no parent. If your main thread did 2 downloads, where each download had 2 chunks, the span tree would look like:

- main
  - initiate-download
  - join-download
  - initiate-download
  - join-download
- download-chunk
- download-chunk
- download-chunk 
- download-chunk <-- which "download operation" do these chunks belong to?

So I started creating "download-tasks" span that's used to group a download's spawned tasks together. So now it looks like:

- main
  - initiate-download
  - join-download
  - initiate-download
  - join-download
- download-tasks <-- yay a common parent per download operation
  - download-chunk
  - download-chunk
- download-tasks
  - download-chunk
  - download-chunk

But download-objects is going to do many downloads, which will lead to many different "download-tasks" spans, and I thought it would be nice to group them under a common parent.under another span. So when download-objects calls this, it passes a span named "download-objects-tasks" and the span tree looks like:

- main
  - initiate-download-objects
  - join-download-objects
- download-objects-tasks <-- 
  - download-tasks
    - download-chunk
    - download-chunk
  - download-tasks
    - download-chunk
    - download-chunk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't calling .instrument() on the future passed to spawn() be sufficient?

…presents waiting. We can infer it's waiting by the gap between the start of "initiate-download" and the start of "send-discovery"
Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's totally possible I'm wrong but I don't think we need to be doing as much passing around of spans/span ID's. A trace is usually a simple tree like structure with parent/child relationships that are usually automatically established by the structure of the code. async/futures takes slightly more care to ensure it's setup properly but it should still generally be automatic.

Happy to discuss offline and do a deeper dive to either see why it has to be this way or to try to figure out a way that doesn't require manually tracking parents.

In particular setting parent to None and establishing a follows_from relationship may break customer expectations when looking at traces from their own code.

e.g.

// src/main.rs

async fn customer_app(tm: aws_s3_transfer_manager::Client) -> Result<()> {
     let span = tracing::debug_span!("customer parent span");
     // all transfer manager download spans from this operation should generally be 
     // a child of the customer span
     let handle = tm.download()
         .bucket(...)
         .key(...)
         .send()
         .instrument(span)
         .await?;
}

Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing we may want to consider at some point is an integration test where we capture the logs and verify the span hierarchy is expected and we don't regress at some point (easy to lose span context in futures).

@graebm graebm merged commit 790ead4 into main Oct 21, 2024
13 checks passed
@graebm graebm deleted the graebm/tracing branch October 21, 2024 22:50
graebm added a commit to awslabs/aws-crt-s3-benchmarks that referenced this pull request Oct 21, 2024
Adapt to changes from awslabs/aws-s3-transfer-manager-rs#66
Now that the transfer manager has a lot of nice spans, we can trim some out of here.
Also, some tweaks to the graph.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants