Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #1781: Add traces/zipkin support to simplify troubleshooting of slow requests #1782

Closed
wants to merge 2 commits into from

Conversation

dlg99
Copy link
Contributor

@dlg99 dlg99 commented Nov 2, 2018

Descriptions of the changes in this PR:

Instrumented bookie and the client with tracing, added configuration to enable tracing.

Motivation

Tracing compliments metrics framework by enabling one's deep dive into performance details of single request. Zipkin provides UI to view such traces though there are other implementations.
Opentracing provides framework for tracing that can be used with various tracing backends, zipkin included.

Changes

Instrumented bookie and the client with tracing, added configuration to enable tracing.
Added unit tests.

Master Issue: #1781


In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
checks for pull requests. A pull request can only be merged when it passes precommit checks. However running all
the precommit checks can take a long time, some trivial changes don't need to run all the precommit checks. You
can check following list to skip the tests that don't need to run for your pull request. Leave them unchecked if
you are not sure, committers will help you:

  • [skip bookkeeper-server bookie tests]: skip testing org.apache.bookkeeper.bookie in bookkeeper-server module.
  • [skip bookkeeper-server client tests]: skip testing org.apache.bookkeeper.client in bookkeeper-server module.
  • [skip bookkeeper-server replication tests]: skip testing org.apache.bookkeeper.replication in bookkeeper-server module.
  • [skip bookkeeper-server tls tests]: skip testing org.apache.bookkeeper.tls in bookkeeper-server module.
  • [skip bookkeeper-server remaining tests]: skip testing all other tests in bookkeeper-server module.
  • [skip integration tests]: skip docker based integration tests. if you make java code changes, you shouldn't skip integration tests.
  • [skip build java8]: skip build on java8. ONLY skip this when ONLY changing files under documentation under site.
  • [skip build java9]: skip build on java9. ONLY skip this when ONLY changing files under documentation under site.


Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

If this PR is a BookKeeper Proposal (BP):

  • Make sure the PR title is formatted like:
    <BP-#>: Description of bookkeeper proposal
    e.g. BP-1: 64 bits ledger is support
  • Attach the master issue link in the description of this PR.
  • Attach the google doc link if the BP is written in Google Doc.

Otherwise:

  • Make sure the PR title is formatted like:
    <Issue #>: Description of pull request
    e.g. Issue 123: Description ...
  • Make sure tests pass via mvn clean apache-rat:check install spotbugs:check.
  • Replace <Issue #> in the title with the actual Issue number.

@dlg99 dlg99 requested review from sijie and jvrao November 2, 2018 02:15
@dlg99
Copy link
Contributor Author

dlg99 commented Nov 2, 2018

Example of how it looks like in the Zipkin UI:

Overall request view:
zipkin1

Client (PCBC) talking to Bookie, trace details:
zipkin2

Other trace details:
zipkin3

@dlg99 dlg99 self-assigned this Nov 2, 2018
Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change.

However is it possible to not tie to a specific tracing library in the code. I would suggest adding interceptor interface to intercept bookkeeper operations, and tracing can be one of interceptor implementation.

for example:

with an interceptor interface, it will be easier to configure the behavior on client with builder method:

BookKeeperBuilder intercept(Interceptor... interceptors);

@merlimat
Copy link
Contributor

merlimat commented Nov 2, 2018

I've seen in many places this is using multiple builder object to track operations in different places. I would like to make sure there is no overhead (like in objects allocations) if the tracing feature is disabled.

@dlg99
Copy link
Contributor Author

dlg99 commented Nov 2, 2018

@merlimat I agree with this concern.
NOOP tracer should not create an overhead (it does not create span/context objects and uses same noop instance as far as I can tell, it does not set active trace as thread local etc.)
If actual tracer is used but sample is set to 0 we'll end up with traces created but not reported (unless somewhere along the way app created active trace that had "sample flag" set.
There are some places where tags are set and i.e. string concatenation is used to build the tag. I'll move these out to make sure string concatenation does not happen if tracing is a NOOP. Otherwise tags on noop spans/context do nothing.

@sijie Let me make sure that I understood you correctly.
Pretty much we need better way to specify tracer, so something similar to metrics in BK where one can drop in a jar and specify class name to use. API along the lines of createTracer, contextToByteString, byteStringToContext.
I agree, I'll do that. I short-circuited this part to get review out sooner.
Opentracing stays as the tracing API. I treat it same way as we do slf4j for logging. I figured since other apache projects use it directly we can do too, i.e. apache/hadoop@trunk...carlosalberto:ot_initial_integration#diff-1f7d0efa26604bfd48d1754534a77becR262
Users can implement their version of Tracer (i.e. opencensus) and register it as GlobalTracer.
i.e. in current implementation any application that uses bk client will have to register tracer on its own, it does not happen as part of client creation/builder.

@sijie
Copy link
Member

sijie commented Nov 2, 2018

Pretty much we need better way to specify tracer, so something similar to metrics in BK where one can drop in a jar and specify class name to use. API along the lines of createTracer, contextToByteString, byteStringToContext.

yes.

Users can implement their version of Tracer (i.e. opencensus) and register it as GlobalTracer.
i.e. in current implementation any application that uses bk client will have to register tracer on its own, it does not happen as part of client creation/builder.

Tracer is one type of interface, which user can implement their tracer logic. However I think a more generic to intercept bookkeeper operations would be better for extensibility, since users can do things beyond tracing, and doesn't have to be tied to a Tracer/OpenTracing interface.

@dlg99
Copy link
Contributor Author

dlg99 commented Nov 3, 2018

@sijie

more generic to intercept bookkeeper operations

as in any possible step? i.e. traces pass context through executor etc, not sure what value it provides outside of the context of higher level trace. Either way, one can create Tracer that does some internal magic with traces, register that Tracer as a global Tracer.

as in very specific points of bookie/client process (add start-finish, remote call start-finish etc)?
This is related to tracing but not 100% the same. I.e. in traces I am interested to see wait/run times on executor etc., in case of such specific points executors are noise.

I am not familiar with pulsar, I took a very brief look at https://github.com/apache/pulsar/blob/7a92111fe7824d422d1739abd06d85955b652212/pulsar-broker/src/test/java/org/apache/pulsar/client/api/InterceptorsTest.java where interceptors seem to be events that happen on start/finish of some operations.
Using Storm as analogy, interceptors seem like a good idea to customize something when a bolt receives/sends a message letting whoever implements the bolt to deal with it internally. If I wanted to trace storm's internals the same way, interceptors seem to become nuisance.
Or using servlet analogy, interceptors are like servlet filters that allow one to i.e. change headers or do something at specific points.
I.e. if we are not interested in too many details abut the message or in changing the message and only need start/finish points of some events, we can use existing bookie's metrics framework to intercept such events. It did not work for me because I wanted to have flexibility in setting parent/child relationships.
Interceptors could be a good feature to add but not necessarily as replacement or something dependent on tracing.

TL;DR:
There is probably some specific usecase that I am failing to catch right away but it feels like interceptors vs traces have some overlap but not complete and my immediate needs closer to traces.
It also feels like interceptors are a great idea for i.e. streaming processing systems where big chunk of the code dealing with data is created by external user but that's not bookkeeper's case as far as I see it.

@sijie
Copy link
Member

sijie commented Nov 5, 2018

@dlg99 :

interceptor is usually a pattern that keep the core business logic separated from any other argumented logics (e.g. tracing, filtering, monitoring). so argumented logic can be easier to unit test rather than coupling with the core runtime.

All what we are caring here is client operations (e.g. metadata operations, data operations) and server side operations (e.g. storage operations, journal operation) and executor operations. If we define some sort of interceptor interfaces for those key operations, it is easier for us to add a tracing interceptor.

There are a few benefits have interceptors:

  1. we can easily add more types of interceptors in future.
  2. people can implement their own tracing interceptors. people is not enforced to use open tracing, and be able to connect with their in-house tracing solution.
  3. even more, when bookkeeper is used in other services (e.g. databases, messaging systems), what people really want is not just bookkeeper's own tracing events. people want to do end-to-end tracing, in that sense, if we provide interceptor interface, people can build their tracing interceptor to push their upstreaming trace context all the wya down to bookkeeper. it is hard to do so if we just use tracer here.

Based on these considerations, I think a better approach will be:

  • define a set of interceptor interfaces, providing methods to intercept key operations.
  • the open tracing implementation should be one interceptor implementation.

Other comments on your comment:

It did not work for me because I wanted to have flexibility in setting parent/child relationships.

An interceptor interface doesn't limit the flexibility. You are still using open tracing api to set parent/child relationship.

Interceptors could be a good feature to add but not necessarily as replacement or something dependent on tracing.

A better way to think about interceptor vs tracing. Interceptor is the interface, while tracing is the implementation. I am not saying interceptors replacing tracing. what I am suggesting is a common interceptor interface is allowing more extensibility, rather than directly sticking to one tracing api.

There is probably some specific usecase that I am failing to catch right away but it feels like interceptors vs traces have some overlap but not complete and my immediate needs closer to traces.

End-to-end tracing is one of the use cases when bookkeeper is used as one piece in the whole system.

Context propagation will be important in such case, providing interceptors rather than directly implementing tracing will be more flexible for people to do proper context propagation and use proper tracing framework (e.g. open source, in house).

Allow any in-house implementations that do tracing but doesn't use any open source tracing api.

@dlg99
Copy link
Contributor Author

dlg99 commented Nov 8, 2018

I think I have an idea what to do, it won't be exactly interceptors fw but rather an extension of current metrics. I'll do the POC as time allows and we'll look at it.

@dlg99 dlg99 closed this Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants