Skip to content

Check if an object is in our heap before using VM map during counting live bytes #1289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 30, 2025

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Mar 24, 2025

This PR fixes a bug in increase_live_bytes. If an object in the VM space is scanned, the method will be called with an object reference into the VM space, which may not be in our heap range. Then we get a space descriptor from the VM map, and we will see panics in the following code.

let index = Self::space_index(address).unwrap();

self.descriptor_map[index]

This PR works around the problem by checking if the object is in our heap range first.

@qinsoon qinsoon requested a review from wks March 24, 2025 04:02
@wks
Copy link
Collaborator

wks commented Mar 24, 2025

Even if the address is in the available range, its space index (Map64) or chunk index (Map32) may still map to a descriptor_map entry that is SpaceDescriptor::UNINITIALIZED. It just happened that increase_live_bytes always calls VMMap::get_descriptor_for_address with the raw address of valid ObjectReference values.

In my opinion, VMMap::get_descriptor_for_address(address: Address) should return Option<SpaceDescriptor> because the argument address is never required to be a valid object address, or an address within a space. And for Map32, some chunks in the heap range may well be un-occupied by any space, and it is natural for get_descriptor_for_address to return None. Then in increase_live_bytes, we just skip objects that map to SpaceDescriptor::UNINITIALIZED.

Note: If the performance of Option<SpaceDescriptor> is an issue, we can redefine SpaceDescriptor as NonZeroUsize because it can never be zero. If we don't want to do this refactoring, we can let get_descriptor_for_address return SpaceDescriptor::UNINITIALIZED if the argument address is not in the heap range. But I strongly recommend using Option<SpaceDescriptor>.

@qinsoon
Copy link
Member Author

qinsoon commented Mar 27, 2025

I changed get_descriptor_for_address. It will return SpaceDescriptor::UNINITIALIZED if the address is not in the heap range. This change should have minimal impact on performance. In the current code, we do unwrap for 64 bits, and we have bounds check for 32 bits. This change just makes those explicit and returns an alternative value.

Using Option<SpaceDescriptor> is fine. But that requires larger changes, such as changing the related tables to Option<SpaceDescriptor>. Otherwise, the function may return None or Some(SpaceDescriptor::UNINITIALIZED) which basically means the same thing.

@k-sareen
Copy link
Collaborator

FYI I wanted to mention that @steveblackburn wanted us to have the same mechanism for address lookup for all our spaces, i.e. VM space should not be special cased. We discussed this in a recent group meeting when I went through the issues I was facing in ART. So arguably this is a bit of a hack still.

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks
Copy link
Collaborator

wks commented Mar 28, 2025

FYI I wanted to mention that @steveblackburn wanted us to have the same mechanism for address lookup for all our spaces, i.e. VM space should not be special cased. We discussed this in a recent group meeting when I went through the issues I was facing in ART. So arguably this is a bit of a hack still.

It is nice to have a unified mechanism to get the space index for any given address (or None if not in any space). But the current SpaceDescriptor mechanism has its restrictions. For example, on Map32, spaces occupy whole chunks; on Map64, spaces occupy contiguous regions of log size log_space_extent. It should be possible to extend SpaceDescriptor to VM spaces if VM spaces also follow such restrictions, but I doubt if this is true. Alternatively, we can implement it like this: We first look up the descriptor_map and, if it returns None or UNINITIALIZED, we fall back to looking up VM spaces by searching all registered VM space ranges.

@qinsoon qinsoon added this pull request to the merge queue Mar 30, 2025
Merged via the queue into mmtk:master with commit 5e7d9da Mar 30, 2025
33 checks passed
@qinsoon qinsoon deleted the fix-count-live-bytes-with-vm-space branch March 30, 2025 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants