Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dvc][server][controller][samza] Heap size estimation improvement #1281

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

FelixGV
Copy link
Contributor

@FelixGV FelixGV commented Nov 4, 2024

[dvc][server][controller][samza] Heap size estimation improvement

Introduced two new utilities to make our on-heap memory usage assessment more
accurate, and easier to maintain as class hierarchies evolve:

  • ClassSizeEstimator: Predicts (based on assumptions about how the JVM's
    memory layout works) the shallow size of a class. This includes the object
    header, all primitive fields, and all references to other objects (but it
    does not count these other objects, hence the shallowness). Reflection is
    used in this class.
  • InstanceSizeEstimator: Predicts the size of instances of a limited number
    of classes. This is not a general-purpose utility, and it requires some
    manual effort to onboard a new class. Reflection is not used in this class.

The general design goals are the following:

  • Reflection should only be used once per class per runtime, and the result
    of this logic should be stored in static constants.
  • On the hot path, there should be no reflection, and we should leverage our
    knowledge of the Venice code base to determine which objects are meant to
    be counted or not. For example, singleton or otherwise shared instances
    should not be counted, since their amortized cost is negligible (besides
    the size of the pointer to refer to them).

The above utilities have been integrated in all classes that implement the
Measurable interface, and several new classes have been given this interface
as well. The Measurable::getSize function has been renamed getHeapSize, to
minimize the chance that it could clash with other function names, and to
make it extra clear what kind of size is meant.

Miscellaneous:

  • Fixed NPEs in AdminConsumptionTask::executeMessagesAndCollectResults.
  • Minor efficiency improvements to PubSubMessageHeaders and ApacheKafkaUtils
    so that empty headers (a common case) carry less overhead. Also made the
    PubSubMessageHeaders implement Iterable.
  • Created a DefaultLeaderMetadata static class in VeniceWriter, so that a
    shared instance can be leveraged in cases where that object is always the
    same (e.g. when producing to the RT topic).
  • BlobSnapshotManagerTest improvements:
    • Added timeouts to all tests.
    • Fixed a race condition in testMultipleThreads.

How was this PR tested?

New unit tests.

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

@gaojieliu
Copy link
Contributor

Have you got a chance to take a look at openjdk/jol implementation?
https://github.com/openjdk/jol
It might carry some issues, but I guess we can analyze the strategy this lib is using to see whether we can borrow some or not.

Comment on lines 15 to 25
/**
* Utility class to help in implementing {@link Measurable#getSize()}. A couple of important points:
*
* 1. This utility class does not "measure" the heap size, but rather attempts to "predict" it, based on knowledge of
* the internals of the JVM. If any of the assumptions are wrong, then of course the results will be inaccurate.
* 2. This utility class assumes we are using the HotSpot JVM.
*/
public class HeapSizeEstimator {
Copy link
Contributor Author

@FelixGV FelixGV Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a thread to respond to @gaojieliu's general comment:

Have you got a chance to take a look at openjdk/jol implementation?
https://github.com/openjdk/jol
It might carry some issues, but I guess we can analyze the strategy this lib is using to see whether we can borrow some or not.

I posted this work on social media and received a lot of interesting examples from folks about other open source projects with the same or similar purpose. I'll maintain a list of these here:

  1. https://github.com/openjdk/jol
  2. https://github.com/ehcache/sizeof
  3. https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java
  4. https://github.com/jbellis/jamm

For now, this work looks pretty similar to (at least some parts of) the above, but I think there is still potential to have a good implementation as part of Venice. In particular, I want a solution in which there is no hot path reflection. For now, this tradeoff is not apparent because I have not yet included examples of how we can leverage this utility in the rest of our code, but I will add it soon when the basic functionality is stabilized...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reflection mentioned here is mainly about performance, right?
Have you tried these alternatives and run it with JMH?
I tried jol and the initialization is slow, but after that, it is fairly fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a small test against jol and it seems it can work properly with Oracle OpenJDK8 release, but not MSFT OpenJDK11 release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-dubrouski
Do you know why MSFT-jdk11 won't work well with attached agent to measure the object size?
I tried to add this jvm arg, but it still doesn't work:

-Djdk.attach.allowAttachSelf=true

And both jol and sizeof are suffering from the same issue with JDK11 as they are using the similar technologies..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check a bit later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, Felix, that's why we should probably use JOL :D ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But where is the learning value in that, @alex-dubrouski :D ?

Besides, this memory estimation code needs to run in DVC, which is one of our client libs, meaning we cannot tightly control which JVM it's going to run in. Ideally, it should not depend on agents or finicky Unsafe code...

That being said, using JOL in unit test to validate the accuracy of this utility might be a good idea 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW... you did not answer my question @alex-dubrouski 😅 but I found out that the hashIsZero was added 5 years ago: openjdk/jdk@89a267c

According to JDK-8221836, this change got released in Java 13, which is weird because I thought I was looking at the Java 17 source code in my IDE, but apparently not; it was probably hanging on to the source of an older Java version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And indeed, I can get it to show me the proper source and see that it's there now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested on JDK17 so yeah, it is there. Plus it is a field not method.


/** Deal with alignment by rounding up to the nearest alignment boundary. */
private static int roundUpToNearestAlignment(int size) {
int partialAlignmentWindowUsage = size % ALIGNMENT_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment could be modified with command line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it can be configured at start time. But are you saying it can be modified at run time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can't be changed at runtime, but you defined static alignment which might not be accurate. That's an edge case of course, I was just saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see... you mean that alignment can be overridden by a JVM option, in which case it wouldn't decide just based on whether it's a 32 bits or 64 bits VM?

Is it via -XX:ObjectAlignmentInBytes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is set statically at bootstrap, but code does not take this into account. Feel free to ignore that, since this is very rare edge case.

* {@link Class#getDeclaredFields()} returns the fields (of all visibility, from private to public and everything
* in between) of the class, but not of its parent class.
*/
for (Field f: c.getDeclaredFields()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the fields are not visible. For example there are native references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point me to a class (perhaps from the JDK) that has a native reference in it? I'd like to test that scenario and see what happens...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java.lang.Object :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'm definitely not taking it into account 😅 ... where can I learn more about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex Shipilev explained that in one of the videos (video is not in English...)

@FelixGV FelixGV force-pushed the memory_usage_estimation branch 2 times, most recently from 16ae593 to c39e456 Compare November 7, 2024 16:49
* @throws {@link StackOverflowError} It should be noted that this function is recursive in nature and so any class
* which contains a loop in its class graph will cause a stack overflow.
*/
public static <T> int getClassOverhead(Class<T> c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this function is mainly calculating the class overhead, not the deep size, right?
If the object contains a nested hashmap, this class will only measure the overhead of the class, and the user needs to manually add the class overhead on top of the hashmap cost, which needs to be calculated manually?

I think it will be useful to have a function to measure both class overhead and the actual data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above requirement is mainly about measuring the deep size of the object.

If we do want to implement such feature, it will be great if we can have some filtering logic to ignore certain types of fields, which can be referring to some common references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this function does not measure the deep size of an object, and that is also the reason why it takes a Class param and not an Object. Essentially, my idea is to hand-code the "variable" part of the measurement of the classes of interest, and the function in this utility is just to make it easier to have the "base cost" (the non-variable part). This utility is not intended to be called in the hot path.

I will integrate the usage of this utility into the MemoryBoundBlockingQueue measurement code. I have not done so yet because I want to be extra sure the utility is as accurate as possible (which it may or may not be, as I am not sure if I have chased all of the bugs properly yet, but at least it should be pretty close...).

@FelixGV FelixGV changed the title [test] HeapSizeEstimator [dvc][server][controller][samza] Heap size estimation improvement Nov 12, 2024
Introduced a new HeapSizeEstimator utility to make our on-heap memory usage
assessment more accurate, and easier to maintain as class hierarchies evolve.

For now it is only used in a unit test, not yet in the main code.
Also did some refactorings and added tests.
HeapSizeEstimator renamed to ClassSizeEstimator
MeasurableUtils renamed to InstanceSizeEstimator

HeapSizeEstimatorTest is now abstract, with two subclasses: ClassSizeEstimatorTest and InstanceSizeEstimatorTest.
Simplified the class size measurement by removing the option of recursing into
the class graph. It turns out that after integrating into all parts of the main
code that need heap size measurement, this was never used, so the utility will
now provide shallow measurement exclusively.

Miscellaneous:

- Minor efficiency improvements to PubSubMessageHeaders, especially when they
  are empty. Also made them implement Iterable, which is more efficient than
  creating a temporary List to iterate over. Added unit tests for this.
… fixed a race

condition in testMultipleThreads.
Most of these changes are just IDE hints about potential NPEs, and fields
which are unused, can be made final or made into a local variable.
Copy link
Contributor

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left some minor comments.

Comment on lines +191 to +195
private static int roundUpToNearest(int size, int intervalSize) {
int partialAlignmentWindowUsage = size % intervalSize;
int waste = partialAlignmentWindowUsage == 0 ? 0 : intervalSize - partialAlignmentWindowUsage;
int finalSize = size + waste;
return finalSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function similar to the following?

Math.ceil((double)size / intervalSize) * intervalSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, and that is the logic I'm using in HeapSizeEstimatorTest::roundUpToNearestAlignment, so they are fully equivalent AFAICT. We could pick either one for the main code. I thought this one was slightly more clear, so I picked it for the main code, but we could do the reverse. LMK if you have a preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the ceil function is common for roundup, I guess it is good to use the same logic in both main and test code.
This is a very minor concern as the logic is simple.
You can decide on your own.

return value -> getSize((KafkaMessageEnvelope) value);
} else if (ProducerMetadata.class.isAssignableFrom(type)) {
return value -> getSize((ProducerMetadata) value);
} else if (ByteBuffer.class.isAssignableFrom(type)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some class extends ByteBuffer to add some additional fields, the returned function won't count them, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. For now, that function is only meant to be used with HeapByteBuffer instances and will throw IllegalArgumentException if a DirectBB is passed. If a subclass of HeapBB is passed, there could be imprecision. I'm not aware of such case though. Are you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can we just use HeapByteBuffer here?


// Iterate from the end to the beginning, so we go from parent to sub
for (int i = classHierarchyFromSubclassToParent.size() - 1; i >= 0; i--) {
int classFieldsOverhead = overheadOfFields(classHierarchyFromSubclassToParent.get(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use the cache here? Some parent class might be calculated before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be incorrect, because the cache contains the final size including object header and alignment. Here, we are looking for the "intermediate result" (the fields only), so we can add all these intermediate results together and account for alignment afterwards. We could have a separate cache for these intermediate results but I don't think it's worth it given that the intent is to only call this once per class per runtime, never on the hot path...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense.
Can you also convert these explanation into Javadoc?

}

private static class LeaderQueueNode extends QueueNode {
private static final int SHALLOW_CLASS_OVERHEAD = ClassSizeEstimator.getClassOverhead(QueueNode.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not LeaderQueueNode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! And that is one of the cases I didn't write a unit test for... I think I need to write one now to have a clean conscience 😅

stuck forever if some task for it got scheduled but then got cancelled
before it started executing...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants