-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling allocation bytes precisely without compromising the performance #5260
Conversation
@amicic @dmitripivkine please review changes, thanks |
gc/base/EnvironmentBase.hpp
Outdated
@@ -664,8 +693,9 @@ class MM_EnvironmentBase : public MM_BaseVirtual | |||
,_currentTask(NULL) | |||
,_slaveThreadCpuTimeNanos(0) | |||
,_freeEntrySizeClassStats() | |||
,_oolTraceAllocationBytes(0) | |||
,_traceAllocationBytes(0) | |||
,_oolTraceAllocationBytesForTracepoint(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I realized that Hook and Tracepoint reside on OpenJ9 side, I'm wondering if it makes sense to expand these names with ... ForTracepoint/ForHook. They make little sense to someone who looks at OMR code and is not aware of OpenJ9 code. @dmitripivkine what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. "ForTracepoint/ForHook" does not explain what these variables are used for without understanding how "Tracepoint" or "Hook" are used for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCExtensionsBase.hpp:
oolObjectSamplingBytesGranularityForTracepoint --> oolObjectSamplingBytesGranularity
objectSamplingBytesGranularityForHook --> objectSamplingBytesGranularity
EnvironmentBase.hpp:
_oolTraceAllocationBytesForTracepoint --> _oolTraceAllocationBytes
_traceAllocationBytesForHook --> _traceAllocationBytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's restore the naming back. Sorry, I know I proposed the change to start with, since I did not like one just being OOL and the other not. (and I still don't like that separation, since both of them end up doing OOL)
Perhaps one should be coarseSamplingAllocatedBytes/coarseSamplingByteGranularity (what OOL is now) and the other one is fineSamplingAllocatedByte/fineSamplingByteGranularity , but unless Dmitri strongly agrees that OOL differentiation is confusing, leave it as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok to reuse old names. At least it simplifies merge for this change.
gc/base/TLHAllocationSupport.hpp
Outdated
} | ||
}; | ||
|
||
/* @return the number of bytes, which are available for allocation in the TLH */ | ||
MMINLINE uintptr_t getRealSize() { return (uintptr_t)getRealTop() - (uintptr_t)getAlloc(); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make more sense to call it Unused or Remaining or Available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitripivkine again, please your opinion. 'Real' is still ok, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Remaining, it explains meaning better. I am not strong on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update this one as per Dmitri's suggestion
gc/base/TLHAllocationInterface.cpp
Outdated
env->_traceAllocationBytes += sizeInBytesAllocated; | ||
|
||
if (!allocDescription->isCompletedFromTlh()) { | ||
if (NULL != result && !allocDescription->isCompletedFromTlh()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (NULL != result && !allocDescription->isCompletedFromTlh()) { | |
if ((NULL != result) && !allocDescription->isCompletedFromTlh()) { |
gc/base/TLHAllocationSupport.cpp
Outdated
|
||
/* Try to cache the current TLH */ | ||
if (NULL != getRealAlloc() && getSize() >= tlhMinimumSize) { | ||
if (NULL != getRealTop() && getRealSize() >= tlhMinimumSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (NULL != getRealTop() && getRealSize() >= tlhMinimumSize) { | |
if ((NULL != getRealTop()) && (getRealSize() >= tlhMinimumSize)) { |
gc/base/TLHAllocationInterface.cpp
Outdated
|
||
uintptr_t sizeInBytesAllocated = (_stats.bytesAllocated(true) - _bytesAllocatedBase); | ||
env->_oolTraceAllocationBytesForTracepoint += sizeInBytesAllocated; /* Increment by bytes allocated for Trace */ | ||
env->_traceAllocationBytesForHook += sizeInBytesAllocated; /* Increment by bytes allocated for Hook */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is critical part to understand. You are saying that
a) if we don't force any extra OOL allocations (due to sampling rate being < TLH size), that bytesAllocated(true) is equal to what bytesAllocated() used to be. Or in other words, that _tlhAllocatedFresh - _tlhDiscardedBytes == _tlhAllocatedUsed? Did you verify that - is it exactly true or just approximately true?
b), if we do force extra OOL allocations (due to sampling rate being < TLH size), that SUM of all bytesAllocated(true) will still be equal to what bytesAllocated() used to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there might be a difference in behavior since _tlhAllocatedFresh was taken after TLH was refreshed (hence it was for next TLH, what seems like not completely correct), and _tlhDiscardedBytes was taken before TLH was refreshed (for previous TLH). On the other side _tlhAllocatedUsed is correctly for previous TLH.
In a long run, when things are averaged _tlhAllocatedFresh - _tlhDiscardedBytes == _tlhAllocatedUsed might hold true, but strictly speaking, I suspect it does not hold true each time this code is executed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do special handling in case of dual TLH support? It was not relevant before (TLH allocation was disabled essentially so did not matter which one). But looks like it becomes relevant now. Is it correct? The simple solution can be disabling of non-zeroed TLH in case of instrumentation
Correction: Dual TLH is handled in OpenJ9 part, so we can discuss it there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_tlhAllocatedUsed is the amount of memory has been flushed, "_tlhAllocatedFresh - _tlhDiscardedBytes == _tlhAllocatedUsed" is true when the TLH is just right before refreshing, after refreshing _tlhAllocatedFresh would be added with new TLH size.
bytesAllocated(true) try to return the size for all of allocated bytes excludes any bytes in current TLH. naming of _tlhAllocatedUsed might be too confuse.
@@ -59,14 +60,22 @@ class MM_AllocationStats : public MM_Base | |||
|
|||
#if defined(OMR_GC_THREAD_LOCAL_HEAP) | |||
uintptr_t tlhBytesAllocated() { return _tlhAllocatedFresh - _tlhDiscardedBytes; } | |||
uintptr_t tlhBytesAllocatedUsed() { return _tlhAllocatedUsed; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put some comment here. For example (if you think this is true):
For tlhBytesAllocated, it's allocated bytes since last TLH refresh.
For tlhBytesAllocatedUsed, it's allocated bytes since last OOL allocation from this TLH (which could be forced before TLH refresh multiple times due to allocation sampling).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytesAllocated(true) try to return the size for all of allocated bytes excludes any bytes in current TLH.
tlhBytesAllocated() also includes the current TLH size, since tlhBytesAllocated() has been used for multiple places, I don't want to change the behavior except for traceAllocationBytes.
gc/base/TLHAllocationSupport.cpp
Outdated
uintptr_t samplingBytesGranularity = UDATA_MAX; | ||
if (!env->needDisableInlineAllocation() && (UDATA_MAX != (samplingBytesGranularity = env->getExtensions()->objectSamplingBytesGranularityForHook))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uintptr_t samplingBytesGranularity = UDATA_MAX; | |
if (!env->needDisableInlineAllocation() && (UDATA_MAX != (samplingBytesGranularity = env->getExtensions()->objectSamplingBytesGranularityForHook))) { | |
uintptr_t samplingBytesGranularity = extensions->objectSamplingBytesGranularityForHook; | |
if (!env->needDisableInlineAllocation() && (UDATA_MAX != samplingBytesGranularity)) { |
gc/base/EnvironmentBase.hpp
Outdated
*/ | ||
bool needDisableInlineAllocation() { | ||
return (getExtensions()->fvtest_disableInlineAllocation || getExtensions()->instrumentableAllocateHookEnabled || getExtensions()->disableInlineCacheForAllocationThreshold); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this helper to Extensions. From a quick look, at all call sites, Extensions is already available as a local.
6bf16c2
to
7335914
Compare
gc/base/EnvironmentBase.hpp
Outdated
uintptr_t _traceAllocationBytes; /**< Tracks the bytes allocated since the last object trace include ool and allocation is completed from TLH */ | ||
uintptr_t _oolTraceAllocationBytes; /**< Tracks the bytes allocated since the last ool object trace for Tracepoint */ | ||
uintptr_t _traceAllocationBytes; /**< Tracks the bytes allocated since the last ool object trace for Hooks ( only record ool allocation bytes and flushed TLH size) */ | ||
uintptr_t _traceAllocationAdjustmentBytes; /**< keep the bytes of times of sampling thresold for last object trace(include allocation byttes inside TLH) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about _traceAllocationBytesCurrentTLH?
@@ -293,6 +293,8 @@ class MM_GCExtensionsBase : public MM_BaseVirtual { | |||
bool doOutOfLineAllocationTrace; | |||
bool doFrequentObjectAllocationSampling; /**< Whether to track object allocations*/ | |||
uintptr_t oolObjectSamplingBytesGranularity; /**< How often (in bytes) we do an ool allocation trace */ | |||
uintptr_t objectSamplingBytesGranularity; /**< How often (in bytes) we do an allocation trace (for triggering J9HOOK_MM_OBJECT_ALLOCATION_SAMPLING) */ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should not really be referring to J9 specific hook. Perhaps you can say:
How often (in bytes) we do allocation sampling as tracked by per thread's local _traceAllocationBytes.
And similarly adjust the OOL one.
gc/stats/AllocationStats.hpp
Outdated
/* return bytesAllocated includes new refreshed TLH, if used == false(default) | ||
* return bytesAllocated (but does not include new refreshed TLH), if used == true. | ||
*/ | ||
uintptr_t bytesAllocated(bool used = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag name should more resemble to your comment (and invert the logic, if needed). Something like include(Just?)RefreshedTLH ?
52e91cf
to
411fe76
Compare
gc/stats/AllocationStats.hpp
Outdated
@@ -37,6 +37,7 @@ class MM_AllocationStats : public MM_Base | |||
uintptr_t _tlhRefreshCountFresh; /**< Number of refreshes where fresh memory was allocated. */ | |||
uintptr_t _tlhRefreshCountReused; /**< Number of refreshes where TLHs were reused. */ | |||
uintptr_t _tlhAllocatedFresh; /**< The amount of memory allocated fresh out of the heap. */ | |||
uintptr_t _tlhAllocatedUsed; /**< The amount of memory has been flushed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to align comment with the variable name: amount of used memory of already flushed TLHs
gc/base/TLHAllocationInterface.cpp
Outdated
|
||
uintptr_t sizeInBytesAllocated = (_stats.bytesAllocated(false) - _bytesAllocatedBase); | ||
env->_oolTraceAllocationBytes += sizeInBytesAllocated; /* Increment by bytes allocated for Trace */ | ||
env->_traceAllocationBytes += sizeInBytesAllocated; /* Increment by bytes allocated for Hook */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the comments. They reference Trace and Hook, which is out of context for OMR. And they don't even comply with coding standards.
@rwy0717 or @youngar please, proceed with further review |
in order to sampling heap allocation bytes precisely without compromising the performance, we have the below changes. Handle instrumentableAllocateHook and VM_OBJECT_ALLOCATE_WITHIN_THRESHOLD is still via disabling inline allocation Handle smapling for tracepoint is still during out of line allocation Handle smapling for JEP331 is via setTLHSamplingTop(size) Using fake Heap Top instead of fake Heap Alloc for disabling inline allocation (realHeapAlloc-->realHeapTop, set/getRealAlloc()-->set/getRealTop(), getRealSize(), getUsedSize()) Using fake Heap Top to force out of line allocation at sampling thresold for sampling heap allocation (setTLHSamplingTop()/resetTLHSamplingTop()) setTLHSamplingTop(size) are only called in the below 3 cases 1, sampling threshold has been changed via GC-VM api j9gc_set_allocation_sampling_interval() 2, TLH is refreshed 3, after sampling is done Counting trace allocation byte includes allocation bytes inside TLH Cache before flushing(_stats.bytesAllocated(true), stats->_tlhAllocatedUsed, ) Handle traceAllocationByte for Health Center(_oolTraceAllocationBytesForTracepoint, oolObjectSamplingBytesGranularityForTracepoint) and traceAllocationByte for JEP331(_traceAllocationBytesForHook, objectSamplingBytesGranularityForHook) independently Signed-off-by: Lin Hu <linhu@ca.ibm.com>
@genie-omr build all |
in order to sampling heap allocation bytes precisely without
compromising the performance, we have the below changes.
Handle instrumentableAllocateHook and
VM_OBJECT_ALLOCATE_WITHIN_THRESHOLD is still via disabling inline
allocation
Handle smapling for tracepoint is still during out of line allocation
Handle smapling for JEP331 is via setTLHSamplingTop(size)
Using fake Heap Top instead of fake Heap Alloc for disabling inline
allocation (realHeapAlloc-->realHeapTop,
set/getRealAlloc()-->set/getRealTop(), getRealSize(), getUsedSize())
Using fake Heap Top to force out of line allocation at sampling thresold
for sampling heap allocation (setTLHSamplingTop()/resetTLHSamplingTop())
setTLHSamplingTop(size) are only called in the below 3 cases
1, sampling threshold has been changed via GC-VM api
j9gc_set_allocation_sampling_interval()
2, TLH is refreshed
3, after sampling is done
Counting trace allocation byte includes allocation bytes inside TLH
Cache before flushing(_stats.bytesAllocated(true),
stats->_tlhAllocatedUsed, )
Handle traceAllocationByte for Health
Center(_oolTraceAllocationBytesForTracepoint,
oolObjectSamplingBytesGranularityForTracepoint) and traceAllocationByte
for JEP331(_traceAllocationBytesForHook,
objectSamplingBytesGranularityForHook) independently
depends on eclipse-openj9/openj9#9767
Signed-off-by: Lin Hu linhu@ca.ibm.com