Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize TraceZ Span Processing Speeds #184

Closed
jajanet opened this issue Jul 16, 2020 · 6 comments
Closed

Optimize TraceZ Span Processing Speeds #184

jajanet opened this issue Jul 16, 2020 · 6 comments
Labels
area:sdk do-not-stale question Further information is requested

Comments

@jajanet
Copy link
Contributor

jajanet commented Jul 16, 2020

Is your feature request related to a problem?
For the initial iteration of zPages, more specifically TraceZ, spans are grabbed and temporarily stored using the TraceZ span processor.

The OT C++ span processors interface doesn't store any spans by default, and other current implementations pass the responsibility of ownership to an exporter at most. The TraceZ span processor functionality deviates from these by storing both running and completed spans, keeping references and ownership of them respectively.

The processor's containers are modified whenever a snapshot getter is called or a span starts/ends, which causes potential thread safety issues when the functions are called concurrently since these containers are shared across these functions. When different functions attempt to read/write to the same place in memory simultaneously, this causes a program to crash.

In order to make the span processor thread-safe, lock guards were added at these functions. At a large scale where many spans could be processed at once, this could potentially make TraceZ scale poorly speed-wise.

Describe the solution you'd like
We want to consider solutions that are also fast while being thread-safe. Some proposed solutions include:

Describe alternatives you've considered
Ideally, we also want to reduce contention (how long services query and use the same places in memory). We attempted to do this through copy-on-write and are considering other methods of doing so.

Additional context

@pyohannes pyohannes added area:sdk question Further information is requested labels Jul 17, 2020
@pyohannes
Copy link
Contributor

I think you have two problems here that you should reason about in separation:

  1. The thread-safety of SpanData itself. The recordable implementation SpanData doesn't give thread safety guarantees in between OnStart and OnEnd processor calls. When attributes are accessed in between those calls, another thread might write at the same time. To eliminate that possibility, I'd recommend to implement a custom Recordable that gives the needed thread safety guarantees.
  2. Making the access to the vector storing running and ended spans thread safe (as aggregator and processor access those containers from different threads).

The lock-free circular buffer would address the second issue. I'm not sure about the details of the proxy/shim approach.

@jajanet
Copy link
Contributor Author

jajanet commented Jul 20, 2020

@pyohannes Thanks for the thorough yet concise explanation! Based on your recommendation, we're opening a couple PRs soon to address this:

  1. PR for a custom recordable, which is essentially a copy of span_data but with locks in the getter/setter methods
  2. PR to change the recordable used in the processor to this new custom recordable, as casting more than once (recordable -> span_data -> threadsafe_span_data) seems unnecessary if that's the span data we'll use in the aggregator.

I don't know the details behind the proxy/shim approach, but will update the issue accordingly when I get more information

@github-actions
Copy link

This issue was marked as stale due to lack of activity. It will be closed in 7 days if no furthur activity occurs.

@github-actions github-actions bot added the Stale label Nov 30, 2021
@github-actions
Copy link

github-actions bot commented Dec 7, 2021

Closed as inactive. Feel free to reopen if this is still an issue.

@github-actions github-actions bot closed this as completed Dec 7, 2021
@lalitb lalitb reopened this Dec 7, 2021
@lalitb lalitb added do-not-stale and removed Stale labels Dec 7, 2021
@github-actions
Copy link

github-actions bot commented Feb 6, 2022

This issue was marked as stale due to lack of activity. It will be closed in 7 days if no furthur activity occurs.

@marcalff
Copy link
Member

marcalff commented Dec 5, 2023

Closing, as zpages is removed, see #2292

@marcalff marcalff closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk do-not-stale question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants