Support for multiple processors #692

lalitb · 2021-04-22T14:52:18Z

Fixes #664

Changes

MultiSpanProcessor - Implements SpanProcessor and internally stores configured SpanProcessors.
MultiRecordable - implements the Recordable interface, and encapsulates several regular recordables.
TracerContext internally maintains the MultiSpanProcessor instance.
TracerContext constructor accepts vector of SpanProcessors to be configured and provide methods to add new SpanProcessors. SpanProcessor once configured cannot be deleted.

Example:
(Using TracerProvider ):

  auto exporter1 = std::unique_ptr<sdktrace::SpanExporter>(new OStreamSpanExporter);
  auto processor1 = std::unique_ptr<sdktrace::SpanProcessor>(new sdktrace::SimpleSpanProcessor(std::move(exporter1)));

  auto exporter2 = std::unique_ptr<sdktrace::SpanExporter>(new InMemorySpanExporter());
  auto processor2 = std::unique_ptr<sdktrace::SpanProcessor>(new sdktrace::SimpleSpanProcessor(std::move(exporter2)));

  auto provider = nostd::shared_ptr<sdk::trace::TracerProvider>(new sdktrace::TracerProvider(std::move(processor1));
  provider->AddProcessor(std::move(processor2));
  // Set the global trace provider
  opentelemetry::trace::Provider::SetTracerProvider(std::move(provider));

(Using TracerContext ):

  auto exporter1 = std::unique_ptr<sdktrace::SpanExporter>(new OStreamSpanExporter);
  auto processor1 = std::unique_ptr<sdktrace::SpanProcessor>(new sdktrace::SimpleSpanProcessor(std::move(exporter1)));

  auto exporter2 = std::unique_ptr<sdktrace::SpanExporter>(new InMemorySpanExporter());
  auto processor2 = std::unique_ptr<sdktrace::SpanProcessor>(new sdktrace::SimpleSpanProcessor(std::move(exporter2)));

  std::vector<std::unique_ptr<sdktrace::SpanProcessor>> processors;
  processors.push_back(std::move(processor1));
  processors.push_back(std::move(processor2));
  auto context = std::make_shared<sdktrace::TracerContext>(std::move(processors));
  auto provider = nostd::shared_ptr<opentelemetry::trace::TracerProvider>(new sdktrace::TracerProvider(context));

  // Set the global trace provider
  opentelemetry::trace::Provider::SetTracerProvider(provider);

For significant contributions please make sure you have completed the following items:

CHANGELOG.md updated for non-trivial changes
Unit tests have been added
Changes in public API reviewed

codecov · 2021-04-22T16:06:57Z

Codecov Report

Merging #692 (74fbead) into main (a3755f4) will decrease coverage by 0.08%.
The diff coverage is 89.65%.

@@            Coverage Diff             @@
##             main     #692      +/-   ##
==========================================
- Coverage   94.81%   94.72%   -0.09%     
==========================================
  Files         214      216       +2     
  Lines        9750     9895     +145     
==========================================
+ Hits         9244     9373     +129     
- Misses        506      522      +16

Impacted Files	Coverage Δ
sdk/src/trace/tracer_provider.cc	`70.58% <41.66%> (-14.60%)`	⬇️
sdk/src/trace/tracer_context.cc	`72.72% <50.00%> (-7.28%)`	⬇️
...ude/opentelemetry/sdk/trace/multi_span_processor.h	`93.33% <93.33%> (ø)`
...include/opentelemetry/sdk/trace/multi_recordable.h	`93.61% <93.61%> (ø)`
ext/test/zpages/tracez_data_aggregator_test.cc	`96.39% <100.00%> (+0.03%)`	⬆️
ext/test/zpages/tracez_processor_test.cc	`98.72% <100.00%> (+0.01%)`	⬆️
sdk/include/opentelemetry/sdk/trace/tracer.h	`100.00% <100.00%> (ø)`
sdk/src/trace/span.cc	`89.01% <100.00%> (ø)`
sdk/test/trace/tracer_provider_test.cc	`100.00% <100.00%> (ø)`
sdk/test/trace/tracer_test.cc	`100.00% <100.00%> (ø)`
... and 3 more

examples/http/tracer_common.hpp

reyang · 2021-04-23T17:35:37Z

sdk/include/opentelemetry/sdk/trace/multi_span_processor.h

+    }
+  }
+
+  void AddProcessor(std::unique_ptr<SpanProcessor> &&processor)


If the intention is to allow processors to be added at runtime, we will need to cover thread safety.

For example, this is thread-safe in a single-writer-multi-reader situation (e.g. the processors chain is handling spans from multiple threads, while one thread is trying to add a new processor).

Given here we're using STL vector, I guess we'll run into race condition - if one thread is trying to add a processor while other threads are sending spans to the processor chain, we might get broken state.

Good comment. My earlier code did have mutex guard against all vector operations in this class, somehow missed out to checkin that. The doubly linked list (as done in dotnet) looks like a good option, would be good to do some benchmarking to select the optimal one of two.

I can share the thinking from the .NET implementation:

STL vector + mutex would give the ultimate thread-safety in multi-reader-multi-writer situation, the potential downside is that it will introduce contention to the readers (in this case, the readers are the threads that push spans to the processor chain), which might be something to avoid.

Doubly linked list only gives single-writer-multi-reader thread-safety, however it is well aligned with the general .NET API guideline (that if a class method doesn't specify thread safety, by default it is not supposed to be called concurrently). And if we need the ultimate thread safety later, we can introduce a lock object which is shared among writers (but not readers), in this way we have the minimum contention on the hot path.

Ok. now I realize why I removed the mutex guard - as MultiSpanProcessor is an atomic instance in TracerContext :

opentelemetry-cpp/sdk/include/opentelemetry/sdk/trace/tracer_context.h

Line 86 in f10acac

opentelemetry::sdk::common::AtomicUniquePtr<SpanProcessor> processor_;

And so multiple threads shouldn't be able to access it simultaneously.

Atomic just means that you can compare-and-swap the pointer. Users can still access it simultaneously (after reading the pointer in memory).

if you have AddProcessor return a new MultiProcessor instance and compare-and-swap the Atomic Ptr then I think you'd be ok, but as is, you still have a (possible) threading issue if more than thread calls AddProcessor. It's unlikely but still possible.

OK, I think I got your point, in that case, we should have Atomic Ptr for Span Processor, and create new instance of MultiProcessor everytime AddProcessor() is called. It is already documented that AddProcessor() is not thread safe, and shouldn't be called simultaneously. I will do these changes.

Ah, I missed that AddProcessor was marked not thread-safe as well. This means my point is not important.

If the changes to make AddProcessor thread-safe aren't too painful, I think it's worth it. I think it's a low-frequency method call, so it's ok to be "expensive" to AddProcessor as long as the "read" hot path is fast (which you already have AFAICT). Removing my requested changes. I'd prefer to see AddProcessor thread-safe but this is fine as-is.

Co-authored-by: Reiley Yang <reyang@microsoft.com>

ThomsonTan · 2021-04-23T18:24:04Z

examples/multi_processor/README.md

@@ -0,0 +1,12 @@
+
+# Simple Trace Example


Remove empty first line and change the title to indicate this is for multiple processors?

Good point, have changed the title, and the description too.

jsuereth

Method name improvements LGTM for Processor / AddProcessor.

+1 to Reiley's comments on thread safety, will re-review when that's updated, but like this approach!

jsuereth · 2021-04-23T18:45:12Z

examples/http/tracer_common.hpp

  // Default is an always-on sampler.
-  auto context = std::make_shared<sdktrace::TracerContext>(std::move(processor));
+  auto context = std::make_shared<sdktrace::TracerContext>(std::move(processors));


Should there be a convenience constructor that just takes on processor?

Alternatively, a static helper method to make TracerProvider from a processor?

+1.

This is what OTel.NET does https://github.com/open-telemetry/opentelemetry-dotnet/blob/a25741030f05c60c85be102ce7c33f3899290d49/docs/trace/extending-the-sdk/Program.cs#L30.

Consider taking a look at the other language clients that reached 1.0 stable and see what's the best practice.
One more example from Python https://opentelemetry-python.readthedocs.io/en/latest/getting-started.html.

This specific example is using TracerContext. This is how it looks like if we directly use TracerProvider to add processors:
( which would be what application developer be doing most of the times ):

auto provider = nostd::shared_ptr<opentelemetry::trace::TracerProvider>(std:move(processor1), resource, sampler, id_generator); provider.AddProcessor(std::move(processor2)); provider.AddProcessor(std::move(processor3)); opentelemetry::trace::Provider::SetTracerProvider(provider);

Do we want this for TracerContext too ?

I have modified the example NOT to use TracerContext, instead directly use TracerProvider to pass processor(s) as an argument to constructor. TracerProvider constructor takes either of single processor, or vector of processors as an argument as shown in example in above comment ( and in PR description ).

…pp into multi-processor-1

lalitb · 2021-04-26T09:45:45Z

Method name improvements LGTM for Processor / AddProcessor.

+1 to Reiley's comments on thread safety, will re-review when that's updated, but like this approach!

As suggested by @reyang , the following changes are done

Modify sdk::trace::MultiSpanProcessor class to store processors in a list instead of vector.
Remove atomicity from TracerContext::processor_ ( i.e change it's type from opentelemetry::sdk::common::AtomicUniquePtr<SpanProcessor> to std::unique_ptr<SpanProcessor>. This is because the list structure in MultiSpanProcessor will take care of single write, multiple read scenario for processors.
TracerProvider::AddProcessor() and TracerContext::AddProcessor() methods are no longer thread safe, and have added that as comment for them.
No change but to mention: We don't support removing processors,

reyang

LGTM.

examples/multi_processor/main.cc

examples/multi_processor/README.md

ThomsonTan · 2021-04-26T18:24:27Z

sdk/include/opentelemetry/sdk/trace/multi_recordable.h

+  const std::unique_ptr<Recordable> &GetRecordable(const SpanProcessor &processor) const noexcept
+  {
+    // TODO - return nullptr ref on failed lookup?
+    static std::unique_ptr<Recordable> empty(nullptr);


nit: this empty could be moved to the line right above return empty? We don't need initialize this value if recordable is found.

Or we could make the empty variable as class static as it is also useful in ReleaseRecordable?

this empty could be moved to the line right above return empty? We don't need initialize this value if recordable is found.

done

Or we could make the empty variable as class static as it is also useful in ReleaseRecordable?

That won't help as ReleaseRecordable needs to return unique_ptr, not it's a reference so class static can't be used. But control would never reach both places as we don't support removing processors once configured.

Miss clicked.

Co-authored-by: Tom Tan <lilotom@gmail.com>

lalitb · 2021-04-27T19:48:02Z

Will see if there are comments from @jsuereth, @pyohannes, and others before closing it,

jsuereth

I think there's still a threading issue around AddProcessor.

You could possible solve this via fully-copying the processors array in the "AddProcessor" method of TracerContext (and then use a compare-and-swap to flip instances) meaning no threads would have an inconsistent processor. However, the code, as it stands, has a very unlikely threading issue.

jsuereth · 2021-04-28T12:39:16Z

sdk/include/opentelemetry/sdk/trace/multi_recordable.h

+
+  const std::unique_ptr<Recordable> &GetRecordable(const SpanProcessor &processor) const noexcept
+  {
+    // TODO - return nullptr ref on failed lookup?


Nit: remove the TODO

jsuereth · 2021-04-28T12:42:31Z

sdk/include/opentelemetry/sdk/trace/multi_span_processor.h

+    }
+  }
+
+  void AddProcessor(std::unique_ptr<SpanProcessor> &&processor)


Atomic just means that you can compare-and-swap the pointer. Users can still access it simultaneously (after reading the pointer in memory).

if you have AddProcessor return a new MultiProcessor instance and compare-and-swap the Atomic Ptr then I think you'd be ok, but as is, you still have a (possible) threading issue if more than thread calls AddProcessor. It's unlikely but still possible.

lalitb · 2021-04-28T13:12:18Z

I think there's still a threading issue around AddProcessor.

You could possible solve this via fully-copying the processors array in the "AddProcessor" method of TracerContext (and then use a compare-and-swap to flip instances) meaning no threads would have an inconsistent processor. However, the code, as it stands, has a very unlikely threading issue.

Thank you @jsuereth for the comments. Just to have a better understanding of the issue - the current approach of switching to the linked-list structure still has race condition around AddProcessor, and another approach would be moving back to vector, and do full-copying followed by flip?

jsuereth · 2021-04-29T15:06:07Z

sdk/include/opentelemetry/sdk/trace/multi_span_processor.h

+    }
+  }
+
+  void AddProcessor(std::unique_ptr<SpanProcessor> &&processor)


Ah, I missed that AddProcessor was marked not thread-safe as well. This means my point is not important.

If the changes to make AddProcessor thread-safe aren't too painful, I think it's worth it. I think it's a low-frequency method call, so it's ok to be "expensive" to AddProcessor as long as the "read" hot path is fast (which you already have AFAICT). Removing my requested changes. I'd prefer to see AddProcessor thread-safe but this is fine as-is.

jsuereth · 2021-04-29T15:09:31Z

@lalitb

Thank you @jsuereth for the comments. Just to have a better understanding of the issue - the current approach of switching to the linked-list structure still has race condition around AddProcessor, and another approach would be moving back to vector, and do full-copying followed by flip?

Yeah, I think you have two options:

In AddProcessor, just do a big compare-and-swap of the whole (immutable-ish) Processor. This should have the advantage that reads are fast, writes are consistent and it should be less work overall.
The double-linked list approach could work, but you likely need to do some memory-barriers / compare-and-swap routines on top of the code you already have to ensure no one views the linked list in an inconsistent state (which, having two pointers, means there's a slight chance).

reyang · 2021-04-29T16:51:19Z

Ah, I missed that AddProcessor was marked not thread-safe as well. This means my point is not important.

If the changes to make AddProcessor thread-safe aren't too painful, I think it's worth it. I think it's a low-frequency method call, so it's ok to be "expensive" to AddProcessor as long as the "read" hot path is fast (which you already have AFAICT). Removing my requested changes. I'd prefer to see AddProcessor thread-safe but this is fine as-is.

FYI - a slightly different perspective for consideration, in C# we chose to make AddProcessor not thread safe (see my explanation here) for couple reasons:

AddProcessor has side effect since processors are meant to be called in FIFO sequence. Having multiple threads adding processors simultaneously is normally an indication of application bug as the sequence is undeterministic.
In C# the convention is that member functions are not thread-safe unless explicitly specified. I think C++ STL follows the same convention.

Additional thoughts:

Maybe we should clarify this in the spec.
Currently the API spec requires all functions on TracerProvider to be thread safe.
The SDK spec doesn't say anything about concurrency (and AddProcessor is exposed by SDK not the API).

lalitb · 2021-04-29T17:00:58Z

AddProcessor has side effect since processors are meant to be called in FIFO sequence. Having multiple threads adding processors simultaneously is normally an indication of application bug as the sequence is undeterministic.

That's a logical reason to enforce non-thread-safe behavior for AddProcessor

lalitb · 2021-04-29T18:19:31Z

@reyang @jsuereth - Thanks for the comments. I am merging it in the double linked-list based approach as per the discussions here. Atm ptr is not used as AddProcessor() is not thread-safe. Will make it explicit in sdk document. Also will create a ticket to discuss and revisit its design in the future if we need to.

lalitb added 5 commits April 22, 2021 17:42

draft

1078d63

memory leak

78740eb

remove example

ac1384b

add example back

74df87a

fix example

544f511

lalitb changed the title ~~Multi processor 1~~ Support for multiple processors Apr 22, 2021

lalitb added 2 commits April 22, 2021 20:33

Merge branch 'main' into multi-processor-1

55b7c95

resolve merge conflict

95966a9

lalitb changed the title ~~Support for multiple processors~~ [WIP] Support for multiple processors Apr 22, 2021

fix conflict

341dbcf

fix example

0310487

lalitb marked this pull request as ready for review April 23, 2021 17:15

lalitb requested a review from a team April 23, 2021 17:15

Merge branch 'main' into multi-processor-1

65fb45c

lalitb requested review from jsuereth and pyohannes April 23, 2021 17:15

lalitb changed the title ~~[WIP] Support for multiple processors~~ Support for multiple processors Apr 23, 2021

reyang reviewed Apr 23, 2021

View reviewed changes

examples/http/tracer_common.hpp Outdated Show resolved Hide resolved

reyang reviewed Apr 23, 2021

View reviewed changes

Update examples/http/tracer_common.hpp

3e45766

Co-authored-by: Reiley Yang <reyang@microsoft.com>

ThomsonTan reviewed Apr 23, 2021

View reviewed changes

jsuereth reviewed Apr 23, 2021

View reviewed changes

lalitb added 4 commits April 26, 2021 00:29

review comments

33168d6

and format

42933ba

Merge branch 'multi-processor-1' of github.com:lalitb/opentelemetry-c…

62d02cb

…pp into multi-processor-1

fix review comments

ddd4507

reyang approved these changes Apr 26, 2021

View reviewed changes

ThomsonTan previously approved these changes Apr 26, 2021

View reviewed changes

lalitb and others added 4 commits April 27, 2021 01:09

fix compile issue after merge

4c9e269

Update examples/multi_processor/README.md

fe19cb2

Co-authored-by: Tom Tan <lilotom@gmail.com>

Merge branch 'main' into multi-processor-1

554b51a

review comment

e2b640f

ThomsonTan approved these changes Apr 27, 2021

View reviewed changes

lalitb added 2 commits April 27, 2021 22:41

Merge branch 'main' into multi-processor-1

7b504d8

Merge branch 'main' into multi-processor-1

e2ae973

Merge branch 'main' into multi-processor-1

59024e4

jsuereth requested changes Apr 28, 2021

View reviewed changes

jsuereth approved these changes Apr 29, 2021

View reviewed changes

lalitb added 2 commits April 29, 2021 22:23

Merge branch 'main' into multi-processor-1

a8456d4

fix merge conflict

74fbead

lalitb merged commit a987f0a into open-telemetry:main Apr 29, 2021

maxgolov mentioned this pull request Jun 1, 2021

Reinstate the TracerContext constructor for single processor #814

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multiple processors #692

Support for multiple processors #692

lalitb commented Apr 22, 2021 •

edited

Loading

codecov bot commented Apr 22, 2021 •

edited

Loading

reyang Apr 23, 2021 •

edited

Loading

lalitb Apr 23, 2021 •

edited

Loading

reyang Apr 23, 2021 •

edited

Loading

lalitb Apr 23, 2021 •

edited

Loading

jsuereth Apr 28, 2021

lalitb Apr 29, 2021 •

edited

Loading

jsuereth Apr 29, 2021

ThomsonTan Apr 23, 2021

lalitb Apr 26, 2021

jsuereth left a comment

jsuereth Apr 23, 2021

reyang Apr 23, 2021

reyang Apr 23, 2021

lalitb Apr 24, 2021 •

edited

Loading

lalitb Apr 26, 2021 •

edited

Loading

lalitb commented Apr 26, 2021

reyang left a comment

ThomsonTan Apr 26, 2021

ThomsonTan Apr 26, 2021

lalitb Apr 27, 2021

lalitb commented Apr 27, 2021

jsuereth left a comment

jsuereth Apr 28, 2021

jsuereth Apr 28, 2021

lalitb commented Apr 28, 2021

jsuereth Apr 29, 2021

jsuereth commented Apr 29, 2021

reyang commented Apr 29, 2021 •

edited

Loading

lalitb commented Apr 29, 2021 •

edited

Loading

lalitb commented Apr 29, 2021 •

edited

Loading

Support for multiple processors #692

Support for multiple processors #692

Conversation

lalitb commented Apr 22, 2021 • edited Loading

Changes

codecov bot commented Apr 22, 2021 • edited Loading

Codecov Report

reyang Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

lalitb Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

reyang Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

lalitb Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lalitb Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsuereth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lalitb Apr 24, 2021 • edited Loading

Choose a reason for hiding this comment

lalitb Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

lalitb commented Apr 26, 2021

reyang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lalitb commented Apr 27, 2021

jsuereth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lalitb commented Apr 28, 2021

Choose a reason for hiding this comment

jsuereth commented Apr 29, 2021

reyang commented Apr 29, 2021 • edited Loading

lalitb commented Apr 29, 2021 • edited Loading

lalitb commented Apr 29, 2021 • edited Loading

lalitb commented Apr 22, 2021 •

edited

Loading

codecov bot commented Apr 22, 2021 •

edited

Loading

reyang Apr 23, 2021 •

edited

Loading

lalitb Apr 23, 2021 •

edited

Loading

reyang Apr 23, 2021 •

edited

Loading

lalitb Apr 23, 2021 •

edited

Loading

lalitb Apr 29, 2021 •

edited

Loading

lalitb Apr 24, 2021 •

edited

Loading

lalitb Apr 26, 2021 •

edited

Loading

reyang commented Apr 29, 2021 •

edited

Loading

lalitb commented Apr 29, 2021 •

edited

Loading

lalitb commented Apr 29, 2021 •

edited

Loading