-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dvc][server][controller][samza] Heap size estimation improvement #1281
base: main
Are you sure you want to change the base?
Conversation
Have you got a chance to take a look at |
/** | ||
* Utility class to help in implementing {@link Measurable#getSize()}. A couple of important points: | ||
* | ||
* 1. This utility class does not "measure" the heap size, but rather attempts to "predict" it, based on knowledge of | ||
* the internals of the JVM. If any of the assumptions are wrong, then of course the results will be inaccurate. | ||
* 2. This utility class assumes we are using the HotSpot JVM. | ||
*/ | ||
public class HeapSizeEstimator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting a thread to respond to @gaojieliu's general comment:
Have you got a chance to take a look at openjdk/jol implementation?
https://github.com/openjdk/jol
It might carry some issues, but I guess we can analyze the strategy this lib is using to see whether we can borrow some or not.
I posted this work on social media and received a lot of interesting examples from folks about other open source projects with the same or similar purpose. I'll maintain a list of these here:
- https://github.com/openjdk/jol
- https://github.com/ehcache/sizeof
- https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java
- https://github.com/jbellis/jamm
For now, this work looks pretty similar to (at least some parts of) the above, but I think there is still potential to have a good implementation as part of Venice. In particular, I want a solution in which there is no hot path reflection. For now, this tradeoff is not apparent because I have not yet included examples of how we can leverage this utility in the rest of our code, but I will add it soon when the basic functionality is stabilized...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reflection
mentioned here is mainly about performance, right?
Have you tried these alternatives and run it with JMH?
I tried jol
and the initialization is slow, but after that, it is fairly fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a small test against jol
and it seems it can work properly with Oracle OpenJDK8 release, but not MSFT OpenJDK11 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alex-dubrouski
Do you know why MSFT-jdk11 won't work well with attached agent to measure the object size?
I tried to add this jvm arg, but it still doesn't work:
-Djdk.attach.allowAttachSelf=true
And both jol
and sizeof
are suffering from the same issue with JDK11 as they are using the similar technologies..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check a bit later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, Felix, that's why we should probably use JOL :D ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But where is the learning value in that, @alex-dubrouski :D ?
Besides, this memory estimation code needs to run in DVC, which is one of our client libs, meaning we cannot tightly control which JVM it's going to run in. Ideally, it should not depend on agents or finicky Unsafe
code...
That being said, using JOL in unit test to validate the accuracy of this utility might be a good idea 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW... you did not answer my question @alex-dubrouski 😅 but I found out that the hashIsZero
was added 5 years ago: openjdk/jdk@89a267c
According to JDK-8221836, this change got released in Java 13, which is weird because I thought I was looking at the Java 17 source code in my IDE, but apparently not; it was probably hanging on to the source of an older Java version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And indeed, I can get it to show me the proper source and see that it's there now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested on JDK17 so yeah, it is there. Plus it is a field not method.
94ff779
to
626a47d
Compare
|
||
/** Deal with alignment by rounding up to the nearest alignment boundary. */ | ||
private static int roundUpToNearestAlignment(int size) { | ||
int partialAlignmentWindowUsage = size % ALIGNMENT_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignment could be modified with command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that it can be configured at start time. But are you saying it can be modified at run time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can't be changed at runtime, but you defined static alignment which might not be accurate. That's an edge case of course, I was just saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see... you mean that alignment can be overridden by a JVM option, in which case it wouldn't decide just based on whether it's a 32 bits or 64 bits VM?
Is it via -XX:ObjectAlignmentInBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is set statically at bootstrap, but code does not take this into account. Feel free to ignore that, since this is very rare edge case.
* {@link Class#getDeclaredFields()} returns the fields (of all visibility, from private to public and everything | ||
* in between) of the class, but not of its parent class. | ||
*/ | ||
for (Field f: c.getDeclaredFields()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the fields are not visible. For example there are native references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you point me to a class (perhaps from the JDK) that has a native reference in it? I'd like to test that scenario and see what happens...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java.lang.Object :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I'm definitely not taking it into account 😅 ... where can I learn more about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex Shipilev explained that in one of the videos (video is not in English...)
internal/venice-common/src/main/java/com/linkedin/venice/memory/HeapSizeEstimator.java
Outdated
Show resolved
Hide resolved
16ae593
to
c39e456
Compare
* @throws {@link StackOverflowError} It should be noted that this function is recursive in nature and so any class | ||
* which contains a loop in its class graph will cause a stack overflow. | ||
*/ | ||
public static <T> int getClassOverhead(Class<T> c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function is mainly calculating the class overhead, not the deep size, right?
If the object contains a nested hashmap, this class will only measure the overhead of the class, and the user needs to manually add the class overhead on top of the hashmap
cost, which needs to be calculated manually?
I think it will be useful to have a function to measure both class overhead and the actual data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above requirement is mainly about measuring the deep size of the object.
If we do want to implement such feature, it will be great if we can have some filtering logic to ignore certain types of fields, which can be referring to some common references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this function does not measure the deep size of an object, and that is also the reason why it takes a Class
param and not an Object
. Essentially, my idea is to hand-code the "variable" part of the measurement of the classes of interest, and the function in this utility is just to make it easier to have the "base cost" (the non-variable part). This utility is not intended to be called in the hot path.
I will integrate the usage of this utility into the MemoryBoundBlockingQueue
measurement code. I have not done so yet because I want to be extra sure the utility is as accurate as possible (which it may or may not be, as I am not sure if I have chased all of the bugs properly yet, but at least it should be pretty close...).
b635dff
to
7b9581a
Compare
Introduced a new HeapSizeEstimator utility to make our on-heap memory usage assessment more accurate, and easier to maintain as class hierarchies evolve. For now it is only used in a unit test, not yet in the main code.
… measures the allocated memory.
Also did some refactorings and added tests.
HeapSizeEstimator renamed to ClassSizeEstimator MeasurableUtils renamed to InstanceSizeEstimator HeapSizeEstimatorTest is now abstract, with two subclasses: ClassSizeEstimatorTest and InstanceSizeEstimatorTest.
Simplified the class size measurement by removing the option of recursing into the class graph. It turns out that after integrating into all parts of the main code that need heap size measurement, this was never used, so the utility will now provide shallow measurement exclusively. Miscellaneous: - Minor efficiency improvements to PubSubMessageHeaders, especially when they are empty. Also made them implement Iterable, which is more efficient than creating a temporary List to iterate over. Added unit tests for this.
… fixed a race condition in testMultipleThreads.
70a7ee0
to
d5573b2
Compare
...roller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminConsumptionTask.java
Outdated
Show resolved
Hide resolved
...roller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminConsumptionTask.java
Outdated
Show resolved
Hide resolved
Most of these changes are just IDE hints about potential NPEs, and fields which are unused, can be made final or made into a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, left some minor comments.
private static int roundUpToNearest(int size, int intervalSize) { | ||
int partialAlignmentWindowUsage = size % intervalSize; | ||
int waste = partialAlignmentWindowUsage == 0 ? 0 : intervalSize - partialAlignmentWindowUsage; | ||
int finalSize = size + waste; | ||
return finalSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function similar to the following?
Math.ceil((double)size / intervalSize) * intervalSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, and that is the logic I'm using in HeapSizeEstimatorTest::roundUpToNearestAlignment
, so they are fully equivalent AFAICT. We could pick either one for the main code. I thought this one was slightly more clear, so I picked it for the main code, but we could do the reverse. LMK if you have a preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the ceil
function is common for roundup, I guess it is good to use the same logic in both main and test code.
This is a very minor concern as the logic is simple.
You can decide on your own.
return value -> getSize((KafkaMessageEnvelope) value); | ||
} else if (ProducerMetadata.class.isAssignableFrom(type)) { | ||
return value -> getSize((ProducerMetadata) value); | ||
} else if (ByteBuffer.class.isAssignableFrom(type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some class extends ByteBuffer
to add some additional fields, the returned function won't count them, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. For now, that function is only meant to be used with HeapByteBuffer
instances and will throw IllegalArgumentException
if a DirectBB
is passed. If a subclass of HeapBB
is passed, there could be imprecision. I'm not aware of such case though. Are you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can we just use HeapByteBuffer
here?
|
||
// Iterate from the end to the beginning, so we go from parent to sub | ||
for (int i = classHierarchyFromSubclassToParent.size() - 1; i >= 0; i--) { | ||
int classFieldsOverhead = overheadOfFields(classHierarchyFromSubclassToParent.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use the cache here? Some parent class might be calculated before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be incorrect, because the cache contains the final size including object header and alignment. Here, we are looking for the "intermediate result" (the fields only), so we can add all these intermediate results together and account for alignment afterwards. We could have a separate cache for these intermediate results but I don't think it's worth it given that the intent is to only call this once per class per runtime, never on the hot path...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, makes sense.
Can you also convert these explanation into Javadoc?
internal/venice-common/src/main/java/com/linkedin/venice/memory/ClassSizeEstimator.java
Show resolved
Hide resolved
} | ||
|
||
private static class LeaderQueueNode extends QueueNode { | ||
private static final int SHALLOW_CLASS_OVERHEAD = ClassSizeEstimator.getClassOverhead(QueueNode.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not LeaderQueueNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! And that is one of the cases I didn't write a unit test for... I think I need to write one now to have a clean conscience 😅
stuck forever if some task for it got scheduled but then got cancelled before it started executing...
[dvc][server][controller][samza] Heap size estimation improvement
Introduced two new utilities to make our on-heap memory usage assessment more
accurate, and easier to maintain as class hierarchies evolve:
memory layout works) the shallow size of a class. This includes the object
header, all primitive fields, and all references to other objects (but it
does not count these other objects, hence the shallowness). Reflection is
used in this class.
of classes. This is not a general-purpose utility, and it requires some
manual effort to onboard a new class. Reflection is not used in this class.
The general design goals are the following:
of this logic should be stored in static constants.
knowledge of the Venice code base to determine which objects are meant to
be counted or not. For example, singleton or otherwise shared instances
should not be counted, since their amortized cost is negligible (besides
the size of the pointer to refer to them).
The above utilities have been integrated in all classes that implement the
Measurable interface, and several new classes have been given this interface
as well. The Measurable::getSize function has been renamed getHeapSize, to
minimize the chance that it could clash with other function names, and to
make it extra clear what kind of size is meant.
Miscellaneous:
so that empty headers (a common case) carry less overhead. Also made the
PubSubMessageHeaders implement Iterable.
shared instance can be leveraged in cases where that object is always the
same (e.g. when producing to the RT topic).
How was this PR tested?
New unit tests.
Does this PR introduce any user-facing changes?