Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests with javaagent. #1643

Merged
merged 180 commits into from
Jan 4, 2021
Merged

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented Nov 16, 2020

Don't get too excited, just first steps in this change for now :)

I wanted to get it out as a PoC just to see what you think of the general direction. Running a test with the agent active, having it load instrumentation for the project under test, and asserting on exported spans works. I'm hoping other steps aren't extremely hard but there are still a few big steps left and a chance of hitting a roadblock, but hope not.

I think there are these major components left

Allow registering multiple custom instrumentation like some of our tests require.

I am using the otel.initializer_jar mechanism to load instrumentation. Some thoughts on how to proceed

  • Allow otel.initializer_jar to accept a CSV of jars.
  • Package multiple instrumentation into single JAR for test. Slows down build somewhat (more repackaging)
  • Don't use otel.initializer_jar and generate an agent JAR for each project with everything bundled. Slows down build more (even more repackaging)

First option seems ok I guess? But maybe doesn't fit with our general customization plans. Second option doesn't seem that bad, especially if java-concurrent and java-classloader are packaged into the agent (always used), making the dual-packaging rare.

Allow registering transformation listeners

If I understand classloaders correctly, we can't simply pass a AgentBuilder.Listener from the test classloader to the agent classloader since the implementation would not be inheriting from the agent classloader's class (@trask correct me if I'm wrong, if so things are trivial). It would be very easy to define an interface similar to AgentBuilder.Listener in the bootstrap classloader for our use. Unfortunately, TRANSFORMED_CLASSES_TYPES depends on byte buddy's TypeDescription which puts a wrinkle in this. But instead of a mostly similar, mostly generic Listener, we could have the callback just accept a boolean matchedGlobalLibrariesIgnoreMatcher, which is only useful for our test, but guess it's ok.

Edit: Added TransformationListener as a POC for this.

Allowing updates to Config

This seems relatively easy, instead of modifying INSTANCE directly by initializing Config in the test classloader, we can just provide a private (accessed-through-reflection) method that accepts Properties and creates the Config in the agent classloader. Config's in bootstrap classloader :D But it's shaded

Also feel free to bring up any potential other gotchas you see to come up with strategies to squash them :D

@anuraaga anuraaga requested review from trask and iNikem November 16, 2020 09:14
@@ -0,0 +1,80 @@
plugins {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project is essentially a "custom distro", with no instrumentation included, and the testing exporter included.

testing/agent-for-testing/agent-for-testing.gradle Outdated Show resolved Hide resolved
@iNikem
Copy link
Contributor

iNikem commented Nov 16, 2020

Disclaimer: I haven't read code changes yet, only PR description :)

First: why don't we want to use the full agent for tests? Not module specific, but full-blown agent as it would run in production?

Second: maybe it is easier to cross JVM boundaries and void classloading issues completely? Use fake-backend as in smoke-tests and adapt our assertions to OTLP?

@anuraaga
Copy link
Contributor Author

anuraaga commented Nov 16, 2020

First: why don't we want to use the full agent for tests? Not module specific, but full-blown agent as it would run in production?

I think this will have significant penalty on build time for a single instrumentation, especially when developing. It means when working on one module, you have to build the entire repo / agent just to run its tests.

Second: maybe it is easier to cross JVM boundaries and void classloading issues completely? Use fake-backend as in smoke-tests and adapt our assertions to OTLP?

Yeah the thought of adapting assertions to OTLP came to mind. I think we can think about it in followup, I'm concerned it may have some drawbacks like coverage (found one field in SpanData not in OTLP) and inability to use helpers that we may make in the SDK (like the TracesAssert). Neither of these may matter but I've written the back-translation for now and think we can look more at it later. There is of course significant tradeoff of maintaining this translation logic right now - but if there's a way to get rid of the assertion maintenance cost (not sure yet) it may end up even. BTW, I didn't try Jackson at all - if it works, it solves all the issues with no code so want to give it a spin.

After reading the code, you'll see we're effectively doing the same as the fake-backend, using in-process gRPC exporter. The machinery to get the byte[] across the classloader I have no issue with at all :)

@iNikem
Copy link
Contributor

iNikem commented Nov 16, 2020

I think this will have significant penalty on build time for a single instrumentation, especially when developing. It means when working on one module, you have to build the entire repo / agent just to run its tests.

On my machine after clean ./gradlew javaagent:shadow took 16s :) Not a significant penalty IMO

@anuraaga
Copy link
Contributor Author

On my machine after clean ./gradlew javaagent:shadow took 16s :) Not a significant penalty IMO

Is that with everything coming from local build cache? That's much faster than most builds I've seen :O

@anuraaga
Copy link
Contributor Author

Good progress - I think most features are implemented (though not fully verified for corner cases such as "skip transformation" and "global ignores mather" checking), my scariest point was resetInstrumentation, but it seems to be working fine, yay. Major piece remaining is instrumentation depending on other instrumentation. Still not fully decided on @iNikem's suggestion of always using full agent, but I would be surprised if it's not slow in at least some if not many development situations. But will continue with that.

LOTS of cleanup left so no need for a detailed review yet (InMemoryExporter is not exporting :D, and testing/agent-exporter has a lot more than just exporter now) but feel free to give it a skim and let me know if anything looks off. I'll probably revert some of the compileOnly changes I made since I finally found the ultimate hammer to get javaagent-bootstrap.jar off of the Gradle test classpath which would prevent the agent from starting.

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My yak clippers are getting a huge workout. Got many tests passing now, still a few stubborns one. Particularly tough ones are

  • Don't know what to do about opentelemetry-api-beta tests since they very much depend on not working with a shaded agent.

  • JAX tests that use runUnderServerTrace (and I think other similar tests) fail confusingly - the trace assertions all assert the span is the request span like "/a", but I always see it fail with it being something like "test", which seems correct to me because of the use of runUnderTrace. Haven't been able to figure out yet what I'm missing that causes these assertions to pass before this PR.

// OpenTelemetry SDK is not needed for compilation, and :opentelemetry-sdk-shaded-for-testing
// is brought in for tests by project(:testing-common) below
exclude group: 'io.opentelemetry', module: 'opentelemetry-sdk'
}
implementation deps.bytebuddy
annotationProcessor deps.autoservice
implementation deps.autoservice
compileOnly deps.autoservice
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemingly innocent change caused a lot of breakage since guava, previously set by autoservice, ended up getting too new for many of our projects. But it's probably better to have the manual downgrading as now to know about these instead of the luck.


// The sources are packaged into the testing jar so we need to make sure to exclude from the test
// classpath, which automatically inherits them, to ensure our shaded versions are used.
classpath = classpath.filter {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized my previous testing wasn't actually doing what we wanted - the non-shaded classes from the test classpath would still be used from the agent. Now it's always the shaded ones.

@anuraaga
Copy link
Contributor Author

@trask Thanks for the hint - removed the dependencies onto agent classes from testing-common and updated most tests to not access them. There are some hairy ones left, and still not too sure about opentelemetry-api-beta (maybe I need to move the tests to a different project that doesn't use the application. prefix)? A couple of other mysteries left but glad to have come up with a workaround for those runUnderServerTrace tests :D

@trask
Copy link
Member

trask commented Dec 23, 2020

@iNikem tests are failing:

WARN muzzleMatcher - Instrumentation skipped, mismatched references were found: undertow -- io.opentelemetry.javaagent.instrumentation.undertow.UndertowInstrumentationModule on jdk.internal.loader.ClassLoaders$AppClassLoader@2f333739
WARN muzzleMatcher - -- io.opentelemetry.javaagent.instrumentation.undertow.UndertowHttpServerTracer:46 Missing class io.opentelemetry.javaagent.instrumentation.api.undertow.KeyHolder

@trask
Copy link
Member

trask commented Dec 23, 2020

@iNikem tests are failing:

WARN muzzleMatcher - Instrumentation skipped, mismatched references were found: undertow -- io.opentelemetry.javaagent.instrumentation.undertow.UndertowInstrumentationModule on jdk.internal.loader.ClassLoaders$AppClassLoader@2f333739
WARN muzzleMatcher - -- io.opentelemetry.javaagent.instrumentation.undertow.UndertowHttpServerTracer:46 Missing class io.opentelemetry.javaagent.instrumentation.api.undertow.KeyHolder

Ignore me (again), I remember there was special packaging of this class, need to carry that over to the new stuff.

@trask
Copy link
Member

trask commented Dec 23, 2020

@iNikem I had to move undertow's KeyHolder to javaagent-api for now. I created #1957 to move it back to the undertow instrumentation.

@trask
Copy link
Member

trask commented Dec 23, 2020

@iNikem one other thing, github actions log file kept stopping on me mid-way through the build, which made it hard to troubleshoot, removing nick-invision/retry resolved it, though we lose that retry. If we start seeing sporadic failures due to lack of that (outer) retry, I'll bring it back.

iNikem
iNikem previously requested changes Dec 23, 2020
Copy link
Contributor

@iNikem iNikem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we lack documentation about new testing infrastructure. The one that would help. e.g. me, to understand how do tests work and how should I go around fixing bugs in test infra. E.g. what are these new modules agent-for-testing and agent-exporter? What are those XXXAccess classes and why they are written the way they are?

.github/workflows/pr.yaml Show resolved Hide resolved
docs/contributing/writing-instrumentation.md Show resolved Hide resolved
gradle/instrumentation.gradle Outdated Show resolved Hide resolved
|| name.startsWith("org.apache.lucene.")
|| name.startsWith("org.apache.tartarus.")
|| name.startsWith("org.json.simple.")
|| name.startsWith("org.yaml.snakeyaml.")) {
return true;
}

if (name.startsWith("net.sf.cglib.")) {
return !name.equals("net.sf.cglib.core.internal.LoadingCache$2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see it :)

@trask
Copy link
Member

trask commented Dec 23, 2020

I still think we lack documentation about new testing infrastructure. The one that would help. e.g. me, to understand how do tests work and how should I go around fixing bugs in test infra.

bug? what bugs? 😂

good suggestion, added 👍

@anuraaga
Copy link
Contributor Author

anuraaga commented Jan 4, 2021

@trask I think the new docs are a big improvement. Let's get this merged since it's so large and prone to drift and we can continue to follow up with anything once @iNikem's back from holidays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants