-
Notifications
You must be signed in to change notification settings - Fork 78
Description
This is the first attempt to use the MEP process for changing a fundamental part of MMTk.
TL;DR
Currently MMTk assumes ObjectReference can be either a pointer to an object or NULL, which is not general for all VMs, especially the VMs that can store tagged values in slots. Meanwhile, MMTk core never processes NULL references. We propose removing ObjectReference::NULL so that ObjectReference is always a valid object reference.
Goal
- Remove
ObjectReference::NULLso thatObjectReferencealways refers to an object.
Non-Goal
- It is not a goal to enforce
Addressto be a non-zero address. If we need, we can add a new typeNonZeroAddressseparately. - It is not a goal to skip object graph edges pointing to singleton objects that represents NULL-like values (such as
nothingandmissingin Julia,Nonein CPython, andnullandundefinedin V8) during tracing. I have opened a separate issue for it: Skip object graph edges to immortal non-moving objects when tracing #1076
Success Metric
- No observable performance impact.
- Remove all invocations (including assertions) of
object.is_null()from mmtk-core. - Existing MMTk-VM APIs involving
ObjectReference::NULLcan still work, by usingNoneor using other designs.
Motivation
Status-quo: All ObjectReference instances refer to objects, except ObjectReference::NULL.
Currently, ObjectReference::NULL is defined as ObjectReference(0), and is used ot represent NULL pointers. However, it
- is not general enough,
- pollutes the API design,
- is prone to missing & redundant NULL checks, and
- encourages non-idiomatic Rust code.
NULL and 0 are not general enough
Not all languages have NULL references. Haskell, for example, is a functional language and all varaibles are initialized before using.
For some VMs (such as CRuby, V8 and Lua), a slot may hold non-reference values. Ruby and V8 can put small integers in slots. Ruby can also put special values such as true, false and nil in slots.
Even if a language has NULL references of some sort, they are not always encoded the same way. Some VMs (such as V8 and Julia) even have different flavors of NULL or "missing value" types.
| Language/VM | Thing | Representation | Note |
|---|---|---|---|
| OpenJDK | null |
0 | |
| JikesRVM | null |
0 | |
| CRuby | nil |
4 | false is represented as 0 |
| V8 | null |
ptr | Pointer to a singleton object of the Oddball type |
| V8 | undefined |
ptr | Pointer to a singleton object of the Oddball type |
| Julia | nothing |
ptr (jl_nothing) |
Pointer to a singleton object of the Nothing type |
| Julia | missing |
ptr | Pointer to a singleton object of the struct Missing type, defined in Julia |
| CPython | None |
ptr (Py_None) |
Pointer to a singleton object of NoneType |
CRuby encodes nil as 4 instead of 0. Python uses a valid reference to a singleton object None to represent missing values.
Some languages have multiple representations of non-existing values. JavaScript has both null and undefined. Julia has both nothing and missing.
For reasons listed above, a single constant ObjectReference::NULL with numerical value 0 is not general at all to cover the cases of missing references or special non-reference values in languages and VMs.
NULL pollutes the API design.
Previously designed for Java, MMTk assumes that
- a slot may hold a NULL pointer, and
- NULL is represented as 0.
This has various influences on the API design and the internal implementation of MMTk-core.
Processing slots (edges)
This issue is discussed in greater detail in #1031. It has been fixed in #1032. Before it was fixed, the method ProcessEdgesWork::process_edge behaved like this:
// Outdated code from ProcessEdgesWork::process_edge
let object = slot.load();
let new_object = self.trace_object(object);
slot.store(object);In these three lines,
slot.load()loads from the slot verbatim, interpreting 0 asObjectReference::NULL.trace_objecthandlesNULL"gracefully" by returningNULL, too.slot.store(object)may overwritesNULLwithNULL, which was supposed to be "benign".
Such assumptions breaks if (1) the VM does not use 0 to encode NULL, or (2) the VM can hold tagged non-reference values in slots. CRuby is affected by both.
PR #1032 fixes this problem by allowing slot.load() to return ObjectReference::NULL even if nil is encoded as 4, or if the slot holds small integers, and process_edge simply skip such slots. It is now general enough to support V8, Julia and CRuby. However, the use of ObjectReference::NULL to represent skipped fields is not idiomatic in Rust. We should use Option<ObjectReference> instead.
ReferenceProcessor
Note: In the future we may move ReferenceProcessor and ReferenceGlue out of mmtk-core. See: #694
ReferenceProcessor is designed for Java, and a Java Reference (soft/weak/phantom ref) can be cleared by setting the referent to null. The default implementation of ReferenceGlue works this way. ReferenceGlue::clear_referent sets the referent to ObjectReference::NULL, and ReferenceProcessor checks if a Reference is cleared by calling referent.is_null().
It works for Java. But not Julia because Julia uses a pointer jl_nothing to represent cleared references. Although ReferenceGlue::clear_referent can be overridden, it was not enough. Commit 9648aed added ReferenceGlue::is_referent_cleared so that ReferenceProcessor can compare the referent against jl_nothing instead of ObjectReference::NULL.
p.s. ReferenceGlue::clear_referent is the only place in mmtk-core (besides tests) that uses the constant ObjectReference::NULL. This means the major part of mmtk-core does not work with NULL references from the VM.
NULL-checking is hard to do right
ObjectReference can be NULL, and the type system cannot tell if a value of type ObjectReference is NULL or not. As a consequence, programmers have to insert NULL-checking statements everywhere. It's very easy to miss necessary checks and add redundant checks.
Missing NULL checks
In the reference processor, the following lines load an ObjectReference from a weak reference object, and try to get its forwarded address.
// Outdated code from ReferenceProcessor::forward
let old_referent = <E::VM as VMBinding>::VMReferenceGlue::get_referent(reference); // Is `old_referent` cleared?
let new_referent = ReferenceProcessor::get_forwarded_referent(trace, old_referent);
<E::VM as VMBinding>::VMReferenceGlue::set_referent(reference, new_referent);The code snippet calls get_forwarded_referent regardless whether old_referent has been cleared or not. Because get_forwarded_referent calls trace_object and trace_object used to return NULL if passed NULL, the code used to be benign for Java. However, the code will not work if the VM does not use 0 to encode a null reference, or the slot can hold tagged non-reference values, for reasons we discussed before. Since the only VM that overrides ReferenceGlue::is_referent_cleared (Julia) does not use MarkCompact, this bug went undetected.
This bug has ben fixed in #1032, but it shows that how hard it is to manually check for NULL in all possible places.
Unnecessary NULL checks
Inside MMTk core, the most problematic functions are the trace_object methods of various spaces.
trace_object: Some spaces checkobject.is_null()intrace_objectand returnNULLif it is null. But it is unnecessary because after SFT orPlanTraceObjectdispatches thetrace_objectcall to a concrete space by the address ofObjectReference, it is guaranteed not to be NULL.
Some API functions check for is_null() because we defined ObjectReference as NULL-able. Those API functions don't make sense for NULL pointers.
-
is_in_mmtk_space(object): It checks if the argument is NULL only because theObjectReferencetype is NULL-able. Any VMs that use this API function to distinguish references of MMTk objects from pointers frommalloc, etc., will certainly check NULL first before doing anything else. -
ObjectReference::is_reachable(): It checksis_null()before using SFT to dispatch the call. IfObjectReferenceis not NULL-able in the first place, the NULL check will be unnecessary.
NULL encourages non-idiomatic Rust code
In Rust, the idiomatic way to represent the absence of a value is None (of type Option<T>). However, ObjectReference::NULL is sometimes used to represent the absence of ObjectReference.
In MarkCompactSpace: Our current MarkCompact implementation stores a forwarding pointer in front of each object for forwarding. When the forwarding pointer is not set, that slot holds a ObjectReference::NULL (value 0). But what it really means is that "there is no forwarded object reference associated with the object".
In Edge::load(): As we discussed before, since #1032, Edge::load() now returns ObjectReference::NULL, it means "the slot is not holding an object reference" even if the slot is holding a tagged non-reference value or a null reference not encoded as numerical 0. In idiomatic Rust, the return type of Edge::load() should be Option<ObjectReference> and it should return None if it is not holding an object reference. We are currently not using Option<ObjectReference> as the return type because the ObjectReference is currently backed by usize and can be 0. Consequently, Option<ObjectReference> has to be larger than a word, and will have additional overhead.
Description
We propose removing the constant ObjectReference::NULL, and make ObjectReference non-NULL-able.
Making ObjectReference non-zero
For performance concerns, we shall change the underlying type of ObjectReference from usize to std::num::NonZeroUsize.
#[repr(transparent)]
pub struct ObjectReference(NonZeroUsize);And there is another good reason for forbidding 0, because no objects can be allocated at or near the address 0. (That assumes ObjectReference is an address. See #1044)
By doing this, Option<ObjectReference> will have the same size as usize due to null pointer optimization. Passing Option<ObjectReference> between functions (including FFI boundary) should have no overhead compared to passing ObjectReference directly.
An ObjectReference can be converted from Address in two ways.
impl ObjectReference {
// We had this method before, but it now returns `Option<ObjectReference>`.
pub fn from_raw_address(addr: Address) -> Option<ObjectReference> {
NonZeroUsize::new(addr.0).map(ObjectReference)
}
// This is new. It assumes `addr` cannot be zero, therefore it is `unsafe`.
pub unsafe fn from_raw_address_unchecked(addr: Address) -> ObjectReference {
debug_assert!(!addr.is_zero());
ObjectReference(NonZeroUsize::new_unchecked(addr.0))
}
}Refactoring the Edge trait
The Edge trait will be modified so that
Edge::load()now returnsOption<ObjectReference>. If a slot does not hold an object reference (null,nil,true,false, small integers, etc.), it shall returnNone.Edge::store(object: ObjectReference)still takes anObjectReferenceas parameter because we can only forward valid references.
Refactoring the reference processor
Note: Ultimately ReferenceGlue and ReferenceProcessor will be moved outside mmtk-core. Here we describe a small-scale refactoring for this MEP.
The ReferenceGlue and ReferenceProcessor will be modified so that
ReferenceGlue::get_referentnow returnsOption<ObjectReference>. It returnsNoneif the reference is already cleared.ReferenceGlue::is_referent_clearedwill be removed.ReferenceGlue::clear_referentwill no longer have a default implementation because mmtk-core no longer assumes the reference object represents "the referent is cleared" by assigning 0 to the referent field.ReferenceProcessorwill no longer callis_referent_cleared, but will check ifget_referentreturnsNoneorSome(referent).
ReferenceProcessor also contains many assertions to ensure references are not NULL. Those can be removed.
Removing unnecessary NULL checks
The PR #1032 already removed the NULL checks related to trace_object.
Public API functions is_in_mmtk_space and ObjectReference::is_reachable will no longer do NULL checks because ObjectReference cannot be NULL in the first place.
The forwarding pointer in MarkCompact
Instead of loading the forwarding pointer as ObjectReference directly, we load the forwarding pointer as an address, and convert it to Option<ObjectReference>. The convertion itself is a no-op.
fn get_header_forwarding_pointer(object: ObjectReference) -> Option<ObjectReference> {
let addr = unsafe { Self::header_forwarding_pointer_address(object).load::<Address>() };
ObjectReference::from_raw_address(addr)
}MarkCompactSpace::compact() calls get_header_forwarding_pointer(obj). It always needs to check if obj has forwarding pointer because obj may be dead, and dead objects don't have forwarding pointers (i.e. get_header_forwarding_pointer(obj) returns None if obj is dead). It used to check with forwarding_pointer.is_null().
Write barrier
Main issue: #1038
The barrier function Barrier::object_reference_write takes ObjectReference as parameters:
fn object_reference_write(
&mut self,
src: ObjectReference,
slot: VM::VMEdge,
target: ObjectReference,
) {
self.object_reference_write_pre(src, slot, target);
slot.store(target);
self.object_reference_write_post(src, slot, target);
}Here target is NULL-able because a user program may execute src.slot = null. (More generally, a JS program may have src.slot = "str"; src.slot = 42;, overwriting a reference with a number.) The type of target can be changed to Option<ObjectReference>. However, the main problem is that slot.store() no longer accept NULL pointers. The root problem is the design of Barrier::object_reference_write, and that needs to be addressed separately. See #1038
The object_reference_write_pre and object_reference_write_post methods should still work after changing target to Option<ObjectReference>. The "pre" and "post" functions do not modify the slot.
For now, we may keep Barrier::object_reference_write as is, but it will not be applicable if target is NULL. Currently no officially supported bindings use Barrier::object_reference_write. Other bindings should call object_reference_write_pre and object_reference_write_post separately and manually stores the new value to the store before #1038 is properly addressed.
Impact on Performance
This MEP should have no visible impact on performance. Preliminary performance evaluation supports this: #1064
Because of null pointer optimization, Option<ObjectReference>, ObjectReference, Option<NonZeroUsize>, NonZeroUsize and usize all have the same layout.
When converting from Address to ObjectReference, neither ObjectReference::from_raw_address (returns Option<ObjectReference>) nor ObjectReference::from_raw_address_unchecked (returns ObjectReference directly) have overhead. But when unwrapping the Option<ObjectReference>, it will involve a run-time check.
The overhead of the None check (pattern matching or opt_objref.unwrap()) should be very small. But if the zero check is a performance bottleneck, we can always use ObjectReference::from_raw_address_unchecked as a fall-back, provided that we know it can't be zero.
There are three known use cases of Option<ObjectReference> in mmtk-core:
slot.load()returnsNoneif a slot doesn't hold a reference,ReferenceGlue::get_referent()returnsNoneif a (weak)Referenceis cleared, and- the forwarding pointers in MarkCompact.
In all those cases, the checks for None are necessary for correctness. Previously, those places check against ObjectReference::NULL.
Impact on Software Engineering
mmtk-core
With ObjectReference guaranteed to be non-NULL, Option<ObjectReference> can be used to indicate an ObjectReference may not exist. As discussed above, typical use cases of Option<ObjectReference> are (1) slot.load(), (2) ReferenceGlue::get_referent() and (3) the forwarding pointer in MarkCompact. The use of Option<T> forces a check to convert Option<ObjectReference> to ObjectReference. By doing this, we can avoid bugs related to missing or redundant NULL checks.
Bindings
Some code needs to be changed in the OpenJDK binding due to this API change. The OpenJDK binding uses struct OpenJDKEdge (which implements trait Edge) to represent a slot in OpenJDK. Because trait Edge is designed from the perspective of mmtk-core, the Edge trait itself does not support storing NULL into the slot. I have to add an OpenJDK-specific method OpenJDKEdge::store_null() to store null to the slot in an OpenJDK-specific way. This is actually expected because not all VMs have null pointers, nor do they encode null, nil, nothing, etc. in the same way. OpenJDKEdge::store_null() also bypasses some bit operations related to compressed OOPs. This change added compexity to the OpenJDK binding, but I think it is the right way to do it.
Another quirk in software engineering is that we sometimes have to call unsafe { ObjectReference::from_raw_address_unchecked(addr) } to bypass the check against zero because we (as humans) are sure addr is never zero. That happens when:
- When we construct an
ObjectReferencefrom the result ofallocoralloc_copy. We know newly allocated objects cannot have zero as their addresses, but the Rust language cannot figure it out unless we addNonZeroAddress, too.- Note that when calling
alloc, MMTk may find it is out of memory. Currently, the behavior is, MMTk core will callCollection::out_of_memory, and thenallocwill returnAddress(0)to the caller. But the default implementation ofCollection::out_of_memoryis panicking, so the binding may assumeallocnever returnsAddress(0)on normal returns. But if the binding overridesCollection::out_of_memory, it will need to actually check if the return value ofallocis 0 instead of using the unsafefrom_raw_address_uncheckedfunction.
- Note that when calling
- In the OpenJDK binding, when we decode a compressed pointer, we now have to check the compressed OOP against zero manually and call
unsafe { ObjectReference::from_raw_address_unchecked(BASE.load(Ordering::Relaxed) + ((v as usize) << SHIFT.load(Ordering::Relaxed))) }, too. The Rust langauge cannot prove that the result can't be zero ifvis not zero, but we as humans know the check against zero is unnecessary.
The presence of unsafe { ... } makes the code look unsafe, but it is actually as safe (or as unsafe) as before.
Risks
Long Term Performance Risks
Converting Address to ObjectReference has overhead only if we don't know whether the address can be zero or not. (We can always use unsafe { ObjectReference::from_raw_address_unchecked(addr) } if we know addr cannot be zero.)
This will remain true in the future. If we don't know if it is zero at compile time, then run-time checking will be necessary, and this MEP enforces the check to be done. Such overhead should always exist regardless whether we allow ObjectReference to be NULL or not (and the overhead may be erroneously omitted if we fail to add a necessary NULL check).
Long Term Software Engineering Risks
Option<ObjectReference> across FFI boundary
One potential problem is the convenience of exposing Option<ObjectReference> to C code via FFI. Ideally, C programs should use uintptr_t for Option<NonZeroUsize>, with 0 representing None. However, Rust currently does not define the layout of Option<NonZeroUsize>. Even though the only possible encoding of None (of type Option<NonZeroUsize>) is 0usize, the Rust reference still states that transmuting None (of type Option<NonZeroUsize>) to usize has undefined behavior. So we have to manually write code to do the conversion, mapping None to 0usize. Despite that, the conversion functions should be easy to implement. We can implement two functions to make the conversion easy:
let word: usize = ffi_utils::objref_to_usize_zeroable(object);
let object: ObjectReference = ffi_utils::usize_to_objref_zeroable(word);That should be concise enough for most use cases.
Currently, very few public API functions exposes the Option<ObjectReference> type. They are:
ObjectReference::get_forwarded_referent(self) -> Option<Self>vo_bit::is_vo_bit_set_for_addr(address: Address) -> Option<ObjectReference>: Although public, VM bindings tend to useis_mmtk_objectinstead.
With this MEP implemented,
Edge::load() -> Option<ObjectReference>will be a new use case.
The software engineering burden should be reasonable for those three API functions. Specifically, the OpenJDK binding currently does not use get_forwarded_referent nor is_vo_bit_set_for_addr, and Edge::load() is trivial to refactor.
If, in the future, the mmtk-core introduces more API functions that involve Option<ObjectReference> (which I don't think is likely to happen), we (or the VM bindings) may introduce macros to automate the conversion.
VM Binding considerations
VM bindings can no longer use the ObjectReference type from mmtk-core to represent its reference types if the VM allows NULL references. Binding writers may find it inconvenient because they need to define their own null-able reference types. But existing VMs already have related types. The OpenJDK binding already has the oop type, and we know it may be encoded as u32 or usize depending on whether CompressedOOP is enabled. The Ruby binding has the VALUE type which is backed by unsigned long and can encode tagged union.
I don't worry about new bindings because if the developer knows a ObjectReference must refer to an object and cannot be NULL or small integers, they will roll their own nullable or tagged reference type and get things right from the start. The problem may be with existing bindings (OpenJDK, Ruby, Julia and V8). If they assumed ObjectReference may be NULL or may hold tagged references, they need to be refactored.
Impact on Public API
The most obvious change is the Edge trait. Edge::load() will return Option<ObjectReference>, and Edge::store(object) will ensure the argument is not NULL. As stated above, OpenJDKEdge::load() has been trivially refactored to adapt to this change.
Other public API functions will no longer accept NULL ObjectReference, but most public API functions never accepted NULL as argument before.
The main problem is object_reference_write and its _pre, _post and _slow variants. As we discussed in the Write barrier section, object_reference_write will stop working for VMs that support null pointers or tagged pointers because we can no longer store NULL to an Edge. However, VMs are still able to use write barriers by calling the _pre and _post functions separately, or inlining the fast path and calling the _slow function separately.
Currently,
- mmtk-openjdk always calls
_preand_postfunctions separately. - mmtk-jikesrvm and mmtk-ruby do not support any generational GC, yet.
- mmtk-julia only calls the
_postand the_slowfunctions.
Since currently no officially supported VM bindings use object_reference_write directly, there is no immediate impact.
But in the long term, we should redesign the write barrier functions to make them more general. See: #1038
Testing
We may add unit tests to ensure
Option<ObjectReference>,ObjectReference,NonZeroUsizeandusizeall have the same size.- The conversion between
Addressto/fromOption<ObjectReference>properly handlesAddress(0)andNone.
And we should add micro benchmarks to ensure
- Conversion between
AddressandOption<ObjectReference>should have no performance penalty. - Unsafe conversion from
AddresstoObjectReferenceshould have no performance penalty. - Converting
Some(ObjectReference)toObjectReference(via matching) should be efficient. - Unwrapping an
Option<ObjectReference>should be efficient.
It is better if we can verify the generated assembly code of the "no penalty" cases to make sure they are no-op.
No tests need to be added around trace_object implementations because the Rust language will ensure the underlying NonZeroUsize will never hold the value 0.
Currently one test involves ObjectReference::NULL, that is, the test for is_in_mmtk_space. It tests if the function returns false when the argument is ObjectReference::NULL. We may remove that test case because we removed ObjectReference::NULL.
Alternatives
We may do nothing, keeping ObjectReference::NULL and use it to represent a missing ObjectReference. MMTk is still capable of performing GC and supporting our current supported VMs wihtout this refactoring. But the problem of this approach has been listed in the Motivation section, namely not general enough, polluting the API, hard to get NULL checks right, and non-idiomatic in Rust.
We may do the opposite, i.e. allowing ObjectReference to represent not only NULL encoded as 0, but also language-specific NULL variants such as nil, nothing, missing, undefined, etc., and allow the binding to define the possible NULL-like values. But if we take this approach, MMTk core will not only have to check for NULL everywhere, but also need to check for other special NULL-like values everywhere, too, making software engineering more difficult.
Assumptions
Currently ObjectReference is backed by usize, and all existing VM bindings implement ObjectReference as a pointer to an object, or to some offset from the start of an object. While this design (implementing ObjectReference as a pointer to object, possibly with an offset) is able to support fat pointers, offsetted pointers, and handles, we acknowledge that it may not be the only possible design. For example, we currently assume that ObjectReference can only represent references, but not non-reference values such as NULL, small integers, true, false, nil, undefined, etc.
If, in the future, we change the definition so that ObjectReference can also hold NULL, nil, true, false, small integers, etc., we will need to think about this MEP again. I (Kunshan) personally strongly disagree with the idea of letting ObjectReference hold a tagged non-reference value, such as small integer. If ObjectReference can be nil, true, false, and small integers, then mmtk-core will need to check whether a given ObjectReference is such special non-ref values everywhere, which is even worse than adding NULL checks everywhere.
MMTk core makes no assumption about how an object reference is stored in a slot. The VM (such as OpenJDK) may store compressed pointers in some slots. That is abstracted out by the Edge::load() method which decompresses the pointer and returns a Some(ObjectReference) or None. If the VM finds the slot is holding a NULL reference after decoding (or before decoding if 0u32 also represents NULL, as in OpenJDK), it still returns None.
Related Issues
Preliminary implementation PRs:
- mmtk-core: Remove NULL ObjectReference #1064
- mmtk-openjdk: Remove NULL ObjectReference mmtk-openjdk#265
Other related issues and PRs:
- Non-null slots (Edge) that don't contain object references, either. #1031: Supporting VMs where slots may hold tagged values. It is the main motivation of this MEP.
- Generalize the substituting write barriers #1038: Problems with the subsuming write barrier function.
- NULL and movement check in process_edge #1032: (merged)
Edge::load()can now return NULL for tagged values so that the slot can be skipped. It can be improved by this MEP if we useNoneinstead of NULL. Also fixes a missing NULL check in the ReferenceProcessor. - Axioms that good definitions of ObjectReference must satisfy #1044: Archived discussions about the definition of
ObjectReference. CanObjectReferencebe addresses, handles, tagged pointers, etc.? - Skip object graph edges to immortal non-moving objects when tracing #1076: A related but orthogonal topic about skipping some edges to singleton NULL-like objects during tracing.