Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tracing allocation bytes #9254

Closed
wants to merge 1 commit into from

Conversation

LinHu2016
Copy link
Contributor

@LinHu2016 LinHu2016 commented Apr 15, 2020

Keep env->_oolTraceAllocationBytes and exts->oolObjectSamplingBytesGranularity
(default = 16MB, can be changedvia jvm option -Xgc:allocationSamplingGranularity=xxx)
for triggering tracepoint Trc_MM_J9AllocateIndexableObject_outOfLineObjectAllocation
and Trc_MM_J9AllocateObject_outOfLineObjectAllocation, which are used by Health Center.
The default of triggering the tracepoints is on, can be enabled/disabled by
-Xgc:allocationSamplingEnable/allocationSamplingDisable.
Allocation bytes for triggering the tracepoints are counted for out of line allocation only.

New env->_traceAllocationBytes and ext->objectSamplingBytesGranularity
(default = 512KB, can be set by J9MemoryManagerFunction j9gc_set_allocation_sampling_interval()) for triggering jvmti event
JVMTI_EVENT_SAMPLED_OBJECT_ALLOC.
The default of triggering jvmti event JVMTI_EVENT_SAMPLED_OBJECT_ALLOC
is off, can be enabled/disabled via j9gc_set_allocation_sampling_interval().
Allocation bytes for triggering JVMTI_EVENT_SAMPLED_OBJECT_ALLOC are counted both inline and out of line allocation.

depends on eclipse-omr/omr#5071

Signed-off-by: Lin Hu linhu@ca.ibm.com

@pshipton
Copy link
Member

Are the compile errors expected?

[ 61%] Building CXX object runtime/gc_base/CMakeFiles/j9gcbase.dir/modronapi.cpp.o
../../../../../openj9/runtime/gc_base/modronapi.cpp: In function ‘void j9gc_set_allocation_sampling_interval(J9VMThread*, UDATA)’:
../../../../../openj9/runtime/gc_base/modronapi.cpp:884:60: error: ‘class MM_GCExtensions’ has no member named ‘disableInlineAllocationForSamplingBytesGranularity’; did you mean ‘oolObjectSamplingBytesGranularity’?
  if ((UDATA_MAX != samplingInterval) && !extensions->disableInlineAllocationForSamplingBytesGranularity) {
                                                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   
                                                            oolObjectSamplingBytesGranularity
../../../../../openj9/runtime/gc_base/modronapi.cpp:885:15: error: ‘class MM_GCExtensions’ has no member named ‘disableInlineAllocationForSamplingBytesGranularity’; did you mean ‘oolObjectSamplingBytesGranularity’?
   extensions->disableInlineAllocationForSamplingBytesGranularity = true;
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
               oolObjectSamplingBytesGranularity
../../../../../openj9/runtime/gc_base/modronapi.cpp:889:15: error: ‘class MM_GCExtensions’ has no member named ‘disableInlineAllocationForSamplingBytesGranularity’; did you mean ‘oolObjectSamplingBytesGranularity’?
   extensions->disableInlineAllocationForSamplingBytesGranularity = false;
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@LinHu2016
Copy link
Contributor Author

this PR is is dependent on eclipse-omr/omr#5071

@LinHu2016 LinHu2016 force-pushed the JEP331 branch 2 times, most recently from d99f303 to 2ab925e Compare April 17, 2020 15:30
@fjeremic fjeremic added the depends:omr Pull request is dependent on a corresponding change in OMR label Apr 17, 2020
@LinHu2016 LinHu2016 force-pushed the JEP331 branch 3 times, most recently from 62f12d9 to da229de Compare April 20, 2020 14:00
@LinHu2016
Copy link
Contributor Author

@dmitripivkine the code has been changed to disable inline allocation for all of threads if disableInlineAllocationForSamplingBytesGranularity is true.

@LinHu2016 LinHu2016 force-pushed the JEP331 branch 8 times, most recently from 6d335c4 to 1b91a23 Compare April 24, 2020 17:52
@LinHu2016
Copy link
Contributor Author

code is updated, ready for review.

@@ -270,7 +271,8 @@ traceObjectCheck(J9VMThread *vmThread)
if(extensions->doOutOfLineAllocationTrace){
uintptr_t byteGranularity = extensions->oolObjectSamplingBytesGranularity;

if(env->_oolTraceAllocationBytes >= byteGranularity){
/* if we want to trace ool Allocation, use env->_oolTraceAllocationBytes for comparison */
if(env->_traceAllocationBytes >= byteGranularity){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I missed something: We are trying to implement new tracing mode using _traceAllocationBytes however need to support existing tracing using _oolTraceAllocationBytes (mutually exclusive cases obviously) sharing oolObjectSamplingBytesGranularity for sampling interval. If it is a case should we have both checks here (is it possible to combine)? Would existing tracing be broken by this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I know only health centre use it, I think that using _traceAllocationBytes would help health centre to get more accurate report too, if we want to keep existing tracing, we need to vm side to update to split usage. @JasonFengJ9 any comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we do increment _oolTraceAllocationBytes in MM_TLHAllocationInterface::allocateObject() but can not find any similar place for _traceAllocationBytes. Where it is incremented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitripivkine the change at line 206 of MM_TLHAllocationInterface::allocateObject(), the code had merged week ago.

	if (NULL != result) {
		uintptr_t sizeInBytesAllocated = allocDescription->getContiguousBytes();
		/* Increment by bytes allocated */
		env->_traceAllocationBytes += sizeInBytesAllocated;

		if (!allocDescription->isCompletedFromTlh()) {
#if defined(OMR_GC_OBJECT_ALLOCATION_NOTIFY)
			env->objectAllocationNotify((omrobjectptr_t)result);
#endif /* OMR_GC_OBJECT_ALLOCATION_NOTIFY */
			_stats._allocationBytes += sizeInBytesAllocated;
			_stats._allocationCount += 1;
		}
	}
	env->_oolTraceAllocationBytes += (_stats.bytesAllocated() - _bytesAllocatedBase); /* Increment by bytes allocated */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I looked to outdated code, sorry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I know only health centre use it, I think that using _traceAllocationBytes would help health centre to get more accurate report too, if we want to keep existing tracing, we need to vm side to update to split usage.

If new feature covers old functionality (do we have tests for it?), it is ok for me to have new one only. I was confused by your change in traceAllocateObject() where you seems try to maintain both variables (why?)
If it is a case more clearing/renaming is required - should _oolTraceAllocationBytes be removed and _ool prefix variables be renamed?

@LinHu2016 LinHu2016 force-pushed the JEP331 branch 2 times, most recently from a3d8238 to 3b27fa2 Compare April 28, 2020 15:12
	-Keep env->_oolTraceAllocationBytes and
exts->oolObjectSamplingBytesGranularity (default = 16MB, can be changed
via jvm option -Xgc:allocationSamplingGranularity=xxx) for triggering
tracepoint Trc_MM_J9AllocateIndexableObject_outOfLineObjectAllocation
and Trc_MM_J9AllocateObject_outOfLineObjectAllocation, which are used by
Health Center
	-The default of triggering the tracepoints is on, can be
enabled/disabled by -Xgc:allocationSamplingEnable/allocationSamplingDisable.

	-New env->_traceAllocationBytes and ext->objectSamplingBytesGranularity
(default = 512KB, can be set by J9MemoryManagerFunction
j9gc_set_allocation_sampling_interval()) for triggering jvmti event
JVMTI_EVENT_SAMPLED_OBJECT_ALLOC.
	-The default of triggering jvmti event JVMTI_EVENT_SAMPLED_OBJECT_ALLOC
is off, can be enabled/disabled via
j9gc_set_allocation_sampling_interval().

Signed-off-by: Lin Hu <linhu@ca.ibm.com>
@LinHu2016
Copy link
Contributor Author

Udate:
Keep env->_oolTraceAllocationBytes and exts->oolObjectSamplingBytesGranularity
(default = 16MB, can be changedvia jvm option -Xgc:allocationSamplingGranularity=xxx)
for triggering tracepoint Trc_MM_J9AllocateIndexableObject_outOfLineObjectAllocation
and Trc_MM_J9AllocateObject_outOfLineObjectAllocation, which are used by Health Center.
The default of triggering the tracepoints is on, can be enabled/disabled by
-Xgc:allocationSamplingEnable/allocationSamplingDisable.

New env->_traceAllocationBytes and ext->objectSamplingBytesGranularity
(default = 512KB, can be set by J9MemoryManagerFunction j9gc_set_allocation_sampling_interval()) for triggering jvmti event
JVMTI_EVENT_SAMPLED_OBJECT_ALLOC.
The default of triggering jvmti event JVMTI_EVENT_SAMPLED_OBJECT_ALLOC
is off, can be enabled/disabled via j9gc_set_allocation_sampling_interval().

@LinHu2016
Copy link
Contributor Author

now tracing allocation bytes supports both tracepoints(Health Center) and JVMTI_EVENT_SAMPLED_OBJECT_ALLOC(JEP331) independently, personal build has passed for local tests, code is ready for review.

@JasonFengJ9
Copy link
Member

Had a conversation with @LinHu2016 about the performance overhead of this PR. If I understand correctly, here is current state of the implementation:

  1. disableInlineTLHAllocate() is invoked when the sampling is enabled, and both GC and JIT allocation are forced to OOL path before allocating within TLH. This approach might be subject to 10-12% perf overhead from a benchmark test;
  2. A better performance solution is in plan which can still perform accurate sampling but might take some time to develop;
  3. An intermediate approach is in discussion that TLH buffer size could be adjusted according to the sampling interval, or still disableInlineTLHAllocate() when the interval is small.

Considering existing implementation misses most of sampling as per #6024 & #7740, I am leaning toward accepting the intermediate solution which is functional correct, allows users continue the field research work and potentially provides more feedback.

@DanHeidinga could you shine some lights on this?

@DanHeidinga
Copy link
Member

~10% overhead for a "low overhead" jvmti feature is too high but better to be correct then broken.
I support delivering the intermediate solution as fast as possible, assuming it can be done quickly and the overhead is much less than 10%.

I'd like to understand the timeline for the better solution

@amicic
Copy link
Contributor

amicic commented May 4, 2020

3 was just a proof of concept that sampling can be done accurately without significant overhead. This is not an intermediate solution. It's just a basis for real solution 2. I estimate about 2wks to get real solution.

@DanHeidinga
Copy link
Member

I estimate about 2wks to get real solution.

We're coming up on the code split for the v0.20.0 release on June 7th. Is the real solution going to be ready and merged before then?

@amicic
Copy link
Contributor

amicic commented Jun 27, 2020

Since we switched the implementation to eclipse-omr/omr#5260 this PR became obsolete.

@amicic amicic closed this Jun 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gc depends:omr Pull request is dependent on a corresponding change in OMR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants