Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] trim size of Reference struct #23853

Merged
merged 11 commits into from
Apr 14, 2022

Conversation

mwtian
Copy link
Member

@mwtian mwtian commented Apr 12, 2022

Why are these changes needed?

During large scale shuffle (number of partitions used >= 1000), driver uses significant amount of memory for storing ObjectRefs. On Intel MacOS, each Reference struct currently takes up 592 bytes. We can reduce per-Reference memory footprint:

  • During shuffle, no ObjectRef borrowing or nesting happens. And in this case fields related to borrowing or nesting should not take up memory. This reduces sizeof(Reference) from 592 to 400.
  • Fields in the Reference struct can be reordered to enhance packing. This reduces sizeof(Reference) from 400 to 368.

On Intel MacOS running the shuffle benchmark with 1000 partitions and 10MB partition size, RSS at the end of shuffle drops from ~5GB to ~4.5GB.

Related issue number

#23604

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@mwtian mwtian force-pushed the reference-mem branch 2 times, most recently from d00647a to 251eece Compare April 12, 2022 03:28
@mwtian mwtian changed the title [WIP] trim Reference struct [Core] trim size of Reference struct Apr 12, 2022
@mwtian mwtian marked this pull request as ready for review April 12, 2022 15:16
@mwtian
Copy link
Member Author

mwtian commented Apr 13, 2022

On Ubuntu 20.04 with tcmalloc, I see similar ~10% reduction in final RSS memory usage from ~6.5GB to ~5.9GB. Heap profile from tcmalloc:
image

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A couple questions about testing:

  1. Were you able to confirm that the shuffle benchmark doesn't set the new optional fields?
  2. Could we add this to the microbenchmark suite and start tracking memory usage in the perf dashboard?

@@ -494,6 +494,50 @@ class ReferenceCounter : public ReferenceCounterInterface,
void ReleaseAllLocalReferences();

private:
struct ContainingReferences {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to NestedReferenceCount? And document that it is only used for nested refs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct does not seem to contain any count? How about NestedReferences, or any other naming suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fields actually are counts if you take the size of the sets but yeah we can use NestedReferenceInfo or something, maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, renamed to NestedReferenceCount.

/// Returns the default value of the struct if it is not set.
const BorrowInfo &borrow() const {
if (borrow_info == nullptr) {
static auto *default_info = new BorrowInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Suggested change
static auto *default_info = new BorrowInfo();
static BorrowInfo default_info;

Then return &default_info? But I think it's fine either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Using static local raw pointers to avoid the objects being destructed was a common idiom in my previous company, so people need not think about the implication of destroying default_info on shutdown. But I can update this if it conflicts with Ray's style.


/// Callback that will be called when this ObjectID no longer has
/// references.
std::function<void(const ObjectID &)> on_delete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it in a different PR, but out of curiosity, did you try testing what would happen if we use the same callback for all objects instead of saving them individually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested that yet and can do this in the next PR.

@stephanie-wang stephanie-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 13, 2022
@mwtian mwtian removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 13, 2022
@mwtian
Copy link
Member Author

mwtian commented Apr 13, 2022

Looks good! A couple questions about testing:

  1. Were you able to confirm that the shuffle benchmark doesn't set the new optional fields?
  2. Could we add this to the microbenchmark suite and start tracking memory usage in the perf dashboard?

Yes the new optional fields do not show up in the profile. I can add some check failure too. I can add a per-object ref memory benchmark separately.

@stephanie-wang stephanie-wang merged commit 1622768 into ray-project:master Apr 14, 2022
@mwtian mwtian deleted the reference-mem branch April 18, 2022 22:38
fishbone pushed a commit that referenced this pull request Apr 20, 2022
…emory cap (#23985)

In a1e06f6, memory bound was added for each subscribed entity in the publisher. It adds two extra `std::deque` per subscribed entity, which turns out to cost a lot more memory when there are a large number of `ObjectRef`s: #23853 (comment)

This PR avoids the extra memory usage for entities in channels unlikely to grow too large, i.e. all channels except those for logs and error info. Subscribed entity memory usage no longer shows up in the memory profile when there are 1M object refs. Raw data: [profile006.pb.gz](https://github.com/ray-project/ray/files/8508547/profile006.pb.gz)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants