Skip to content

Conversation

@bantonsson
Copy link
Contributor

@bantonsson bantonsson commented Jul 3, 2020

This PR consists of three parts that are separate, but build on each other.

  1. Don't insert all tags into the CoreSpanBuilder tags hash map just to copy them out and insert them into the DDSpanContext tags map.
  2. Since the tags map in DDSpanContext in most cases is probably not accessed concurrently, change it to a normal HashMap that is guarded by synchronized access.
  3. Make sure that processing of tag interceptors and rules don't take the lock multiple times.

These three combined shaves off 25-45% of the time being spent in creating a span and adding tags to it.

|                                                                          |      |     |   Master |         |       PR |         |       |
|Benchmark                                                                 | Mode | Cnt |    Score |   Error |    Score |   Error | Units |
|--------------------------------------------------------------------------|------|-----|----------|---------|----------|---------|-------|
|StartSpanBenchmark.addTagAfterChildSpan                                   |   ss |  30 |    8.041 | ± 0.784 |    4.658 | ± 0.511 | ms/op |
|StartSpanBenchmark.addTagAfterChildSpan:·gc.alloc.rate.norm               |   ss |  30 | 1115.891 | ± 0.093 |  691.863 | ± 0.086 |  B/op |
|StartSpanBenchmark.addTagWitInterceptorAfterChildSpan                     |   ss |  30 |    6.878 | ± 0.668 |    4.569 | ± 0.248 | ms/op |
|StartSpanBenchmark.addTagWitInterceptorAfterChildSpan:·gc.alloc.rate.norm |   ss |  30 | 1003.890 | ± 0.094 |  659.862 | ± 0.095 |  B/op |
|StartSpanBenchmark.childSpan                                              |   ss |  30 |    6.780 | ± 0.660 |    4.421 | ± 0.245 | ms/op |
|StartSpanBenchmark.childSpan:·gc.alloc.rate.norm                          |   ss |  30 | 1003.891 | ± 0.094 |  659.862 | ± 0.096 |  B/op |
|StartSpanBenchmark.onTraceComplete                                        |   ss |  30 |    1.359 | ± 0.163 |    1.233 | ± 0.177 | ms/op |
|StartSpanBenchmark.onTraceComplete:·gc.alloc.rate.norm                    |   ss |  30 |    0.252 | ± 0.004 |    0.220 | ± 0.005 |  B/op |
|StartSpanBenchmark.onTraceCompleteWithTagMatchingRule                     |   ss |  30 |    8.701 | ± 0.694 |    5.996 | ± 0.666 | ms/op |
|StartSpanBenchmark.onTraceCompleteWithTagMatchingRule:·gc.alloc.rate.norm |   ss |  30 | 1616.242 | ± 0.004 | 1360.217 | ± 0.005 |  B/op |
|StartSpanBenchmark.rootSpan                                               |   ss |  30 |    6.828 | ± 0.512 |    5.201 | ± 0.522 | ms/op |
|StartSpanBenchmark.rootSpan:·gc.alloc.rate.norm                           |   ss |  30 | 1560.242 | ± 0.006 | 1304.216 | ± 0.004 |  B/op |

This is the Bencmark:

package datadog.trace.core;

import datadog.trace.bootstrap.instrumentation.api.AgentSpan;
import datadog.trace.core.processor.TraceProcessor;
import java.util.Collections;
import java.util.List;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

@State(Scope.Benchmark)
@OperationsPerInvocation(value = 10000)
@BenchmarkMode(value = Mode.SingleShotTime)
@Warmup(iterations = 100)
@Measurement(iterations = 30)
public class StartSpanBenchmark {

  private final int count = 10000;
  private CoreTracer tracer;
  private AgentSpan parentSpan;
  private TraceProcessor processor = new TraceProcessor();
  private List<DDSpan> trace;

  @Setup(Level.Trial)
  public void init() {
    // TODO maybe add something more to this one
    tracer = new CoreTracer.CoreTracerBuilder().build();
    trace = Collections.singletonList((DDSpan) tracer.startSpan("processingTrace"));
  }

  @Setup(Level.Iteration)
  public void gc() {
    parentSpan = tracer.startSpan("parentSpan");
    System.gc();
  }

  @Benchmark()
  public void rootSpan(Blackhole bh) {
    for (int i = 0; i < count; i++) {
      bh.consume(tracer.startSpan("rootSpan"));
    }
  }

  @Benchmark()
  public void childSpan(Blackhole bh) {
    for (int i = 0; i < count; i++) {
      bh.consume(tracer.startSpan("childSpan", parentSpan.context()));
    }
  }

  @Benchmark()
  public void addTagAfterChildSpan(Blackhole bh) {
    for (int i = 0; i < count; i++) {
      bh.consume(tracer.startSpan("childSpan", parentSpan.context()).setTag("foo", "bar"));
    }
  }

  @Benchmark()
  public void addTagWitInterceptorAfterChildSpan(Blackhole bh) {
    for (int i = 0; i < count; i++) {
      bh.consume(tracer.startSpan("childSpan", parentSpan.context()).setTag("service.name", "bar"));
    }
  }

  @Benchmark()
  public void onTraceComplete(Blackhole bh) {
    for (int i = 0; i < count; i++) {
      bh.consume(processor.onTraceComplete(trace));
    }
  }

  @Benchmark()
  public void onTraceCompleteWithTagMatchinRule(Blackhole bh) {
    for (int i = 0; i < count; i++) {
      bh.consume(
          processor.onTraceComplete(
              Collections.singletonList(
                  (DDSpan) tracer.startSpan("processingTrace").setTag("resource.name", "bar"))));
    }
  }
}

@bantonsson bantonsson requested a review from a team as a code owner July 3, 2020 15:18
@richardstartin
Copy link
Contributor

Is there any impact on the rule processing in the trace processing disruptor? That typically ends up being a hotspot. The synchronized block won't be contended but want to be sure this doesn't make that worse.

@bantonsson bantonsson force-pushed the ban/change-tags-map branch from 183c78b to 42bd262 Compare July 3, 2020 15:54
@bantonsson
Copy link
Contributor Author

Excellent question @richardstartin . I can actually see an increase in the time being spent in onTraceComplete in the TraceProcessor in some of the profiles. It should definitely be looked into before this goes in.

@bantonsson bantonsson self-assigned this Jul 28, 2020
@bantonsson bantonsson force-pushed the ban/change-tags-map branch from 42bd262 to 58a437c Compare July 30, 2020 13:23
@bantonsson
Copy link
Contributor Author

I've updated the code and the benchmarks, and also changed the tags interceptor and rules processing to not take the locks repeatedly.

Copy link
Contributor

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust that the impact on rules processing performance has been taken into account.

@bantonsson
Copy link
Contributor Author

@richardstartin Yes, the third commit is to fix that.

@richardstartin richardstartin merged commit 86da39a into master Jul 31, 2020
@richardstartin richardstartin deleted the ban/change-tags-map branch July 31, 2020 18:29
@github-actions github-actions bot added this to the 0.59.0 milestone Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants