Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling allocation bytes precisely without compromising the performance #5260

Merged
merged 1 commit into from
Jun 3, 2020

Conversation

LinHu2016
Copy link
Contributor

@LinHu2016 LinHu2016 commented May 29, 2020

in order to sampling heap allocation bytes precisely without
compromising the performance, we have the below changes.

Handle instrumentableAllocateHook and
VM_OBJECT_ALLOCATE_WITHIN_THRESHOLD is still via disabling inline
allocation
Handle smapling for tracepoint is still during out of line allocation
Handle smapling for JEP331 is via setTLHSamplingTop(size)

Using fake Heap Top instead of fake Heap Alloc for disabling inline
allocation (realHeapAlloc-->realHeapTop,
set/getRealAlloc()-->set/getRealTop(), getRealSize(), getUsedSize())
Using fake Heap Top to force out of line allocation at sampling thresold
for sampling heap allocation (setTLHSamplingTop()/resetTLHSamplingTop())
setTLHSamplingTop(size) are only called in the below 3 cases
1, sampling threshold has been changed via GC-VM api
j9gc_set_allocation_sampling_interval()
2, TLH is refreshed
3, after sampling is done

Counting trace allocation byte includes allocation bytes inside TLH
Cache before flushing(_stats.bytesAllocated(true),
stats->_tlhAllocatedUsed, )
Handle traceAllocationByte for Health
Center(_oolTraceAllocationBytesForTracepoint,
oolObjectSamplingBytesGranularityForTracepoint) and traceAllocationByte
for JEP331(_traceAllocationBytesForHook,
objectSamplingBytesGranularityForHook) independently

depends on eclipse-openj9/openj9#9767
Signed-off-by: Lin Hu linhu@ca.ibm.com

@LinHu2016
Copy link
Contributor Author

@amicic @dmitripivkine please review changes, thanks

@@ -664,8 +693,9 @@ class MM_EnvironmentBase : public MM_BaseVirtual
,_currentTask(NULL)
,_slaveThreadCpuTimeNanos(0)
,_freeEntrySizeClassStats()
,_oolTraceAllocationBytes(0)
,_traceAllocationBytes(0)
,_oolTraceAllocationBytesForTracepoint(0)
Copy link
Contributor

@amicic amicic Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I realized that Hook and Tracepoint reside on OpenJ9 side, I'm wondering if it makes sense to expand these names with ... ForTracepoint/ForHook. They make little sense to someone who looks at OMR code and is not aware of OpenJ9 code. @dmitripivkine what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. "ForTracepoint/ForHook" does not explain what these variables are used for without understanding how "Tracepoint" or "Hook" are used for

Copy link
Contributor Author

@LinHu2016 LinHu2016 Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCExtensionsBase.hpp:
oolObjectSamplingBytesGranularityForTracepoint --> oolObjectSamplingBytesGranularity
objectSamplingBytesGranularityForHook --> objectSamplingBytesGranularity
EnvironmentBase.hpp:
_oolTraceAllocationBytesForTracepoint --> _oolTraceAllocationBytes
_traceAllocationBytesForHook --> _traceAllocationBytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's restore the naming back. Sorry, I know I proposed the change to start with, since I did not like one just being OOL and the other not. (and I still don't like that separation, since both of them end up doing OOL)

Perhaps one should be coarseSamplingAllocatedBytes/coarseSamplingByteGranularity (what OOL is now) and the other one is fineSamplingAllocatedByte/fineSamplingByteGranularity , but unless Dmitri strongly agrees that OOL differentiation is confusing, leave it as you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to reuse old names. At least it simplifies merge for this change.

}
};

/* @return the number of bytes, which are available for allocation in the TLH */
MMINLINE uintptr_t getRealSize() { return (uintptr_t)getRealTop() - (uintptr_t)getAlloc(); };
Copy link
Contributor

@amicic amicic Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make more sense to call it Unused or Remaining or Available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitripivkine again, please your opinion. 'Real' is still ok, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like Remaining, it explains meaning better. I am not strong on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update this one as per Dmitri's suggestion

env->_traceAllocationBytes += sizeInBytesAllocated;

if (!allocDescription->isCompletedFromTlh()) {
if (NULL != result && !allocDescription->isCompletedFromTlh()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (NULL != result && !allocDescription->isCompletedFromTlh()) {
if ((NULL != result) && !allocDescription->isCompletedFromTlh()) {


/* Try to cache the current TLH */
if (NULL != getRealAlloc() && getSize() >= tlhMinimumSize) {
if (NULL != getRealTop() && getRealSize() >= tlhMinimumSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (NULL != getRealTop() && getRealSize() >= tlhMinimumSize) {
if ((NULL != getRealTop()) && (getRealSize() >= tlhMinimumSize)) {


uintptr_t sizeInBytesAllocated = (_stats.bytesAllocated(true) - _bytesAllocatedBase);
env->_oolTraceAllocationBytesForTracepoint += sizeInBytesAllocated; /* Increment by bytes allocated for Trace */
env->_traceAllocationBytesForHook += sizeInBytesAllocated; /* Increment by bytes allocated for Hook */
Copy link
Contributor

@amicic amicic Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is critical part to understand. You are saying that
a) if we don't force any extra OOL allocations (due to sampling rate being < TLH size), that bytesAllocated(true) is equal to what bytesAllocated() used to be. Or in other words, that _tlhAllocatedFresh - _tlhDiscardedBytes == _tlhAllocatedUsed? Did you verify that - is it exactly true or just approximately true?
b), if we do force extra OOL allocations (due to sampling rate being < TLH size), that SUM of all bytesAllocated(true) will still be equal to what bytesAllocated() used to be.

Copy link
Contributor

@amicic amicic Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there might be a difference in behavior since _tlhAllocatedFresh was taken after TLH was refreshed (hence it was for next TLH, what seems like not completely correct), and _tlhDiscardedBytes was taken before TLH was refreshed (for previous TLH). On the other side _tlhAllocatedUsed is correctly for previous TLH.
In a long run, when things are averaged _tlhAllocatedFresh - _tlhDiscardedBytes == _tlhAllocatedUsed might hold true, but strictly speaking, I suspect it does not hold true each time this code is executed

Copy link
Contributor

@dmitripivkine dmitripivkine Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do special handling in case of dual TLH support? It was not relevant before (TLH allocation was disabled essentially so did not matter which one). But looks like it becomes relevant now. Is it correct? The simple solution can be disabling of non-zeroed TLH in case of instrumentation
Correction: Dual TLH is handled in OpenJ9 part, so we can discuss it there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tlhAllocatedUsed is the amount of memory has been flushed, "_tlhAllocatedFresh - _tlhDiscardedBytes == _tlhAllocatedUsed" is true when the TLH is just right before refreshing, after refreshing _tlhAllocatedFresh would be added with new TLH size.
bytesAllocated(true) try to return the size for all of allocated bytes excludes any bytes in current TLH. naming of _tlhAllocatedUsed might be too confuse.

@@ -59,14 +60,22 @@ class MM_AllocationStats : public MM_Base

#if defined(OMR_GC_THREAD_LOCAL_HEAP)
uintptr_t tlhBytesAllocated() { return _tlhAllocatedFresh - _tlhDiscardedBytes; }
uintptr_t tlhBytesAllocatedUsed() { return _tlhAllocatedUsed; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put some comment here. For example (if you think this is true):

For tlhBytesAllocated, it's allocated bytes since last TLH refresh.
For tlhBytesAllocatedUsed, it's allocated bytes since last OOL allocation from this TLH (which could be forced before TLH refresh multiple times due to allocation sampling).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytesAllocated(true) try to return the size for all of allocated bytes excludes any bytes in current TLH.
tlhBytesAllocated() also includes the current TLH size, since tlhBytesAllocated() has been used for multiple places, I don't want to change the behavior except for traceAllocationBytes.

Comment on lines 269 to 270
uintptr_t samplingBytesGranularity = UDATA_MAX;
if (!env->needDisableInlineAllocation() && (UDATA_MAX != (samplingBytesGranularity = env->getExtensions()->objectSamplingBytesGranularityForHook))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uintptr_t samplingBytesGranularity = UDATA_MAX;
if (!env->needDisableInlineAllocation() && (UDATA_MAX != (samplingBytesGranularity = env->getExtensions()->objectSamplingBytesGranularityForHook))) {
uintptr_t samplingBytesGranularity = extensions->objectSamplingBytesGranularityForHook;
if (!env->needDisableInlineAllocation() && (UDATA_MAX != samplingBytesGranularity)) {

*/
bool needDisableInlineAllocation() {
return (getExtensions()->fvtest_disableInlineAllocation || getExtensions()->instrumentableAllocateHookEnabled || getExtensions()->disableInlineCacheForAllocationThreshold);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this helper to Extensions. From a quick look, at all call sites, Extensions is already available as a local.

@LinHu2016 LinHu2016 force-pushed the JEP331_update branch 2 times, most recently from 6bf16c2 to 7335914 Compare June 2, 2020 02:45
uintptr_t _traceAllocationBytes; /**< Tracks the bytes allocated since the last object trace include ool and allocation is completed from TLH */
uintptr_t _oolTraceAllocationBytes; /**< Tracks the bytes allocated since the last ool object trace for Tracepoint */
uintptr_t _traceAllocationBytes; /**< Tracks the bytes allocated since the last ool object trace for Hooks ( only record ool allocation bytes and flushed TLH size) */
uintptr_t _traceAllocationAdjustmentBytes; /**< keep the bytes of times of sampling thresold for last object trace(include allocation byttes inside TLH) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about _traceAllocationBytesCurrentTLH?

@@ -293,6 +293,8 @@ class MM_GCExtensionsBase : public MM_BaseVirtual {
bool doOutOfLineAllocationTrace;
bool doFrequentObjectAllocationSampling; /**< Whether to track object allocations*/
uintptr_t oolObjectSamplingBytesGranularity; /**< How often (in bytes) we do an ool allocation trace */
uintptr_t objectSamplingBytesGranularity; /**< How often (in bytes) we do an allocation trace (for triggering J9HOOK_MM_OBJECT_ALLOCATION_SAMPLING) */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not really be referring to J9 specific hook. Perhaps you can say:
How often (in bytes) we do allocation sampling as tracked by per thread's local _traceAllocationBytes.
And similarly adjust the OOL one.

/* return bytesAllocated includes new refreshed TLH, if used == false(default)
* return bytesAllocated (but does not include new refreshed TLH), if used == true.
*/
uintptr_t bytesAllocated(bool used = false) {
Copy link
Contributor

@amicic amicic Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag name should more resemble to your comment (and invert the logic, if needed). Something like include(Just?)RefreshedTLH ?

@LinHu2016 LinHu2016 force-pushed the JEP331_update branch 2 times, most recently from 52e91cf to 411fe76 Compare June 2, 2020 16:08
@@ -37,6 +37,7 @@ class MM_AllocationStats : public MM_Base
uintptr_t _tlhRefreshCountFresh; /**< Number of refreshes where fresh memory was allocated. */
uintptr_t _tlhRefreshCountReused; /**< Number of refreshes where TLHs were reused. */
uintptr_t _tlhAllocatedFresh; /**< The amount of memory allocated fresh out of the heap. */
uintptr_t _tlhAllocatedUsed; /**< The amount of memory has been flushed */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to align comment with the variable name: amount of used memory of already flushed TLHs


uintptr_t sizeInBytesAllocated = (_stats.bytesAllocated(false) - _bytesAllocatedBase);
env->_oolTraceAllocationBytes += sizeInBytesAllocated; /* Increment by bytes allocated for Trace */
env->_traceAllocationBytes += sizeInBytesAllocated; /* Increment by bytes allocated for Hook */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the comments. They reference Trace and Hook, which is out of context for OMR. And they don't even comply with coding standards.

@amicic
Copy link
Contributor

amicic commented Jun 2, 2020

@rwy0717 or @youngar please, proceed with further review

in order to sampling heap allocation bytes precisely without
compromising the performance, we have the below changes.

Handle instrumentableAllocateHook and
VM_OBJECT_ALLOCATE_WITHIN_THRESHOLD is still via disabling inline
allocation
Handle smapling for tracepoint is still during out of line allocation
Handle smapling for JEP331 is via setTLHSamplingTop(size)

Using fake Heap Top instead of fake Heap Alloc for disabling inline
allocation (realHeapAlloc-->realHeapTop,
set/getRealAlloc()-->set/getRealTop(), getRealSize(), getUsedSize())
Using fake Heap Top to force out of line allocation at sampling thresold
for sampling heap allocation (setTLHSamplingTop()/resetTLHSamplingTop())
setTLHSamplingTop(size) are only called in the below 3 cases
	1, sampling threshold has been changed via GC-VM api
j9gc_set_allocation_sampling_interval()
	2, TLH is refreshed
	3, after sampling is done

Counting trace allocation byte includes allocation bytes inside TLH
Cache before flushing(_stats.bytesAllocated(true),
stats->_tlhAllocatedUsed, )
Handle traceAllocationByte for Health
Center(_oolTraceAllocationBytesForTracepoint,
oolObjectSamplingBytesGranularityForTracepoint) and traceAllocationByte
for JEP331(_traceAllocationBytesForHook,
objectSamplingBytesGranularityForHook) independently


Signed-off-by: Lin Hu <linhu@ca.ibm.com>
@rwy7
Copy link
Contributor

rwy7 commented Jun 3, 2020

@genie-omr build all

@rwy7 rwy7 self-assigned this Jun 3, 2020
@rwy7 rwy7 merged commit c2b3c97 into eclipse-omr:master Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants