Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JTReg Test Failure - serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java #6024

Open
JasonFengJ9 opened this issue Jun 5, 2019 · 16 comments
Assignees
Milestone

Comments

@JasonFengJ9
Copy link
Member

JasonFengJ9 commented Jun 5, 2019

OpenJ9 does not generate sufficient JVMTI sampling object allocation events when the sampling interval is set to zero

This is a JTreg test failure against JEP 331 implementation.

Failure link

N/A - local runs

Optional info

  • intermittent failure (no):
  • new feature test:

Failure output

Exception in thread "main" java.lang.RuntimeException: Could not set the sampler
	at MyPackage.HeapMonitor.sampleEverything(HeapMonitor.java:187)
	at MyPackage.HeapMonitorVMEventsTest.main(HeapMonitorVMEventsTest.java:100)

Notes:

The failed test code snippet is in [1]. This test sets the sampling interval to 0, then attempts to allocate 400 integer arrays int[1000], and expects 400 sampling events which is considered as PASS. Repeat this 10 times until a pass, otherwise a failure.
OpenJ9 seems only generate sampling events 12 - 38 for such 400 array allocation, which is quite far from the number of events required.

I built a standalone testcase TestSampleEverything.zip without JVMTI agent, this issue can be demonstrated via following:

java -Xgc:allocationSamplingGranularity=1 -Xtrace:print={j9mm.394-395} -jar TestSampleEverything.zip 2>&1 | grep elements=1000 | wc -l

The expected number of trace output for object size=4008; number of elements=1000 is close to 10 * 400, i.e., 4000, but usually just got below 400.

[1] https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/1c55bba834d441d680298c240ac241ec9aa0b668/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java#L164-L188

fyi: @charliegracie @DanHeidinga

@JasonFengJ9
Copy link
Member Author

Another JTReg Test Failure - serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatIntervalTest.java:

Exception in thread "main" java.lang.RuntimeException: Interval average over 10% for interval 1024 -> 281.0, 102400.0
	at MyPackage.HeapMonitorStatIntervalTest.testIntervalOnce(HeapMonitorStatIntervalTest.java:62)
	at MyPackage.HeapMonitorStatIntervalTest.testInterval(HeapMonitorStatIntervalTest.java:77)
	at MyPackage.HeapMonitorStatIntervalTest.main(HeapMonitorStatIntervalTest.java:87)

This test expects 102400 sampling events but just got 281 when the sampling interval is set to 1024.
This is probably due to same cause as this issue.

@JasonFengJ9
Copy link
Member Author

After a bit digging, it appears MM_GCExtensions:oolObjectSamplingBytesGranularity based sampling only applies to out-of-line allocation, and all those successful inline allocation via ObjectAllocationAPI.hpp:inlineAllocateObject() or inlineAllocateIndexableObject() are not traced via -Xgc:allocationSamplingGranularity regardless how small the sampling granularity is set.

JEP331 shares same sampling mechanism hence doesn't sample those inline allocation as this JTreg test discovered.

@charliegracie could you shine some lights on this?

@charliegracie
Copy link
Contributor

The test is bogus in my opinion. This is a "sample" and in such is not guaranteed to fire an exact number of samples. I will look into what we could do but this should not work on HotSpot either. The JEP talks about a random sample base such that a common application pattern would not always give the same samples. If that was true then this test could never work! We should be pushing back on the test implementation. Sadly as this is a JEP I doubt we will be able to provide any input.

@charliegracie
Copy link
Contributor

I will look into what we can do to "improve" the situation but I doubt we will ever have a case where we sample all 400 allocations in the above scenario.

@charliegracie
Copy link
Contributor

Here is a quick hack that may do what you want. FYI I have not compiled or tested this code.

diff --git a/runtime/gc_base/modronapi.cpp b/runtime/gc_base/modronapi.cpp
index 07220c9a3..11a22772d 100644
--- a/runtime/gc_base/modronapi.cpp
+++ b/runtime/gc_base/modronapi.cpp
@@ -810,6 +810,15 @@ j9gc_set_allocation_sampling_interval(J9VMThread *vmThread, UDATA samplingInterv
 		samplingInterval = 1;
 	}
 	extensions->oolObjectSamplingBytesGranularity = samplingInterval;
+#if defined(J9VM_GC_THREAD_LOCAL_HEAP) || defined(J9VM_GC_SEGREGATED_HEAP)
+	MM_GCExtensions *extensions = MM_GCExtensions::getExtensions(currentThread);
+	J9JavaVM *vm = currentThread->javaVM;
+	J9InternalVMFunctions const * const vmFuncs = vm->internalVMFunctions;
+	IDATA const key = extensions->_TLHAsyncCallbackKey;
+	vmFuncs->J9SignalAsyncEvent(vm, NULL, key);
+	vmFuncs->J9CancelAsyncEvent(vm, currentThread, key);
+	memoryManagerTLHAsyncCallbackHandler(currentThread, key, (void*)vm);
+#endif /* defined(J9VM_GC_THREAD_LOCAL_HEAP) || defined(J9VM_GC_SEGREGATED_HEAP) */
 }
 
 /**
diff --git a/runtime/gc_modron_startup/mgcalloc.cpp b/runtime/gc_modron_startup/mgcalloc.cpp
index 93d0092f8..471a36e9e 100644
--- a/runtime/gc_modron_startup/mgcalloc.cpp
+++ b/runtime/gc_modron_startup/mgcalloc.cpp
@@ -626,6 +626,24 @@ memoryManagerTLHAsyncCallbackHandler(J9VMThread *vmThread, IDATA handlerKey, voi
 		Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventNotHooked(vmThread);
 		extensions->disableInlineCacheForAllocationThreshold = false;
 	}
+
+	if (extensions->doOutOfLineAllocationTrace) {
+		//TODO ADD a new tracepoint
+		//Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventIsHooked(vmThread);
+		if (extensions->isStandardGC() || extensions->isVLHGC()) {
+#if defined(J9VM_GC_THREAD_LOCAL_HEAP)
+			extensions->disableInlineCacheForAllocationThreshold = (extensions->oolObjectSamplingBytesGranularity < extensions->tlhMaximumSize);
+#endif /* defined(J9VM_GC_THREAD_LOCAL_HEAP) */
+		} else if (extensions->isSegregatedHeap()) {
+#if defined(J9VM_GC_SEGREGATED_HEAP)
+			extensions->disableInlineCacheForAllocationThreshold = (extensions->oolObjectSamplingBytesGranularity <= J9VMGC_SIZECLASSES_MAX_SMALL_SIZE_BYTES);
+#endif /* defined(J9VM_GC_SEGREGATED_HEAP) */
+		}
+	} else {
+		//TODO ADD a new tracepoint
+		//Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventNotHooked(vmThread);
+		extensions->disableInlineCacheForAllocationThreshold = false;
+	}
 	
 	if (extensions->isStandardGC() || extensions->isVLHGC()) {
 #if defined(J9VM_GC_THREAD_LOCAL_HEAP)

@charliegracie
Copy link
Contributor

Also the sampling still only happens on out of line allocates. The above hack forces out of line allocates for all allocations when the sampling threshold is less than maximum TLH size.

@JasonFengJ9
Copy link
Member Author

Thanks @charliegracie for your quick response, will give it a try.

@JasonFengJ9
Copy link
Member Author

JasonFengJ9 commented Jun 17, 2019

The patch doesn't work as expected because of _oolTraceAllocationBytes is only updated for out-of-line allocation [1].

It appears _oolTraceAllocationBytes wasn't updated within
https://github.com/eclipse/openj9/blob/c1b8968fc67ddea5ab2e6deea84fd7f36570f0d1/runtime/gc_include/ObjectAllocationAPI.hpp#L74
and
https://github.com/eclipse/openj9/blob/c1b8968fc67ddea5ab2e6deea84fd7f36570f0d1/runtime/gc_include/ObjectAllocationAPI.hpp#L163

[1] https://github.com/eclipse/omr/blob/e21bb944bc31b3b3101fa4eb68eb6bc428c913c1/gc/base/TLHAllocationInterface.cpp#L212

P.S. RI does pass MyPackage/HeapMonitorVMEventsTest.java, i.e., it can sample every allocation in this testcase.

@charliegracie
Copy link
Contributor

just because the RI finds all allocations does not mean it is correct re the wording of the specification. Threads are not atomically updated to see the new values as this is a performance nightmare. The proposed changes above rely on async checks to cause the threads to recognize the new values. No JVM, I repeat no JVM, could be expected to have a lightweight allocation monitoring system force threads through an exclusive vm access to update this value. Did the proposed change improve things at all?

The inline call you posted in the previous comment may not have anything to do with this since the JIT can inline allocate as well. The only acceptable solution is to wait for the next TLH refresh fo the thread to obey the new threshold. If this can not how the JEP is worded then it is flawed and we should be pushing back on the JEP or its tests.

@JasonFengJ9
Copy link
Member Author

Did the proposed change improve things at all?

There is no observable difference.

Allocation and _oolTraceAllocationBytes
A printf() within ObjectAllocationAPI.hpp:inlineAllocateIndexableObject() shows most of allocation (size = 1000 for the integer array size) occurred here. MM_EnvironmentBase::_oolTraceAllocationBytes is not updated within this inline allocation method. It is only incremented within mgcalloc.cpp:MM_TLHAllocationInterface::allocateObject().

-Xjit:count=0 appears to take away those inlineAllocateIndexableObject() calls.

So inlineAllocateIndexableObject() (and JIT allocations) don't increase the field _oolTraceAllocationBytes, is this expected or a known limitation?

The cause of insufficient sampling event
I think the cause of insufficient sampling event is not due to the new sampling interval value not recognized right away, it is because those inline allocations are not recorded at first place.

JEP and RI behaviours
This JEP is optional hence not warranted by any compatibility tests. I just use this JTReg test to match RI behaviours as much as possible. For this particular testcase, RI passes consistently, i.e., it can sample each allocation while OpenJ9 hits only around 1/10.

@charliegracie
Copy link
Contributor

Hmm looking at your test case you are not changing the value of _oolTraceAllocationBytes via the GC API during runtime you are specifying it via the command line. I thought the JVMTI code allowed this value to be changed dynamically at runtime. Does it? My changes above will only have an impact if you call the j9gc_set_allocation_sampling_interval after your application threads are up and running. It is not the final correct solution but I was suggesting it as a test. Let me see if I can modify it to handle your current use case as well.

@charliegracie
Copy link
Contributor

Also you are correct the code currently only accounts for objects allocated ool. It should also be considering TLH refreshes as well. Looking into this now

@JasonFengJ9
Copy link
Member Author

Hmm looking at your test case you are not changing the value of _oolTraceAllocationBytes via the GC API during runtime you are specifying it via the command line.

The JTReg testcase does call j9gc_set_allocation_sampling_interval [1]. Since this JEP implementation shares same event trigger mechanism with GC allocation sampling via -Xgc:allocationSamplingGranularity, the attached testcase is to demonstrate the sampling pattern without JVMTI natives.

I did test the patch with original JTReg JVMTI testcase and got similar results as before.

[1] https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/ad988f03e08bf150c97380e32741409e4e597432/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java#L166

@charliegracie
Copy link
Contributor

@JasonFengJ9 is there a JVMTI API for modifying the sampling threshold?

@DanHeidinga
Copy link
Member

@charliegracie Is this still something you plan to work on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants