Skip to content

Rename is_reachable and is_live #1271

Closed
@wks

Description

@wks

TL;DR:

  • "Is reachable" and "is live" are wrong questions to ask during tracing, and what ObjectReference::is_reachable() really does is querying whether an object has been reached during tracing. We need better questions and better function names.
  • In StickyImmix, object.is_reachable() returns true if object is mature and we haven't reached it. This is wrong and should be fixed.

Problem

As I was writing the finalization and weak references chapter of the porting guide, I realized that the word "reachable" may not be meaningful during tracing.

We say an object is "reachable" if there is a path from roots to that object. We can even define "strongly/softly/weakly/phantomly reachable" if there are soft/weak/phantom references. But all those definitions of reachability are about the state of an object in an object graph. If an object is strongly reachable, it is always strongly reachable before, during or after a GC. The reachability only changes if the mutator mutates the object graph.

But during tracing, what we really care about is whether an object has been reached. MMTk-core processes weak references and finalizers in two ways

  1. Using the old API, we run the strong Closure, then SoftRefClosure, then WeakRefClosure, then FinalRefClosure, then PhantomRefClosure.
  2. Using the new API, we run the strong Closure, then we run Scanning::process_weak_ref multiple time, each processing a different weak reference strength.

In either way, after a previous closure is finished, we ask "Is this object already reached in previous closures?", and we ask this by calling ObjectReference::is_reachable(). Note that this question is different from "Is this object softly/weakly/phantomly reachable?".

  • During tracing, the state of object.is_reachable() changes as we gradually trace through one reference after another, but
  • whether an object is softly/weakly/phantomly reachable only depends on the shape of the object graph, and MMTk only knows this after each level of transitive closure is complete.

The current method ObjectReference::is_reachable() actually returns whether an object is reached. It answers the question we want to ask (i.e. whether the object is reached), but the method name is_reachable() is wrong.

Similarly, ObjectReference::is_live() is also wrong. The life and death of objects is not determined until the end of all tracing stages.

New names

ObjectReference::is_reachable() should be renamed ObjectReference::is_reached().

  • Scope: It can be called during tracing, i.e. from the TPinningClosure to the VMRefClosure work buckets.
  • Semantics: It returns true if an object has been traced (i.e. reached by trace_object) in this GC since tracing started.

ObjectReference::is_live() should be renamed ObjectReference::is_retained().

  • Scope: Same as is_reached.
  • Semantics: It returns true if either is_reached() returns true, or the GC will retain this object for other reasons, for example,
    • The object is in the immortal space. (No. It is not safe to keep a weak reference to an un-reached object even if it is in the immortal space. See below)
    • The object is in the mature space during a nursery GC.

Which to call when handling weak references, is_reached or is_retained?

It should be is_retained(). Suppose there is an old object o1, and a young WeakReference named y1 which weakly references o1. During a nursery GC, we will not trace o1 because it is in the mature space. This means o1 will not be reached during this nursery GC. However, because nursery GCs never reclaim old objects (i.e. they assume old objects are all alive), the weak reference y1 must not be cleared. In fact, a path from root to o1 may exist, going through multiple mature objects before reaching o1. So o1 can be alive according to the object graph even though we don't trace it.

On the contrary, is_reached() shall return false if we didn't trace the old object.

However, the current implementation of ObjectReference::is_reachable() simply calls SFT::is_reachable() which in turn calls SFT::is_live(). For StickyImmix SFT::is_live() checks the mark bit which is always set for mature objects (hence the name "sticky"). So object.is_reachable() will still return true if object is a mature object, which is technically wrong! We should make the semantics clear and suggest not to use is_reachable (or the renamed is_reached) for weak reference processing.

How do we implement is_reached for StickyImmix?

With sticky mark bits, StickyImmix cannot distinguish between (1) old objects retained in previous nursery GCs and (2) young objects just marked during the current GC. If we have to distinguish between them, we may need a different metadata bit. But that seems unnecessary.

If it is unnecessary to ask "whether an object is reached during the current GC", we may delete the is_reachable() method, and keep is_alive() (and rename it to is_retained()).

Are objects in the immortal space always live (or retained)?

Not really. Although we never reclaim any space from the ImmortalSpace, we can't assume all objects in the ImmortalSpace are live.

Suppose the plan is SemiSpace and each GC moves every single object in the two copy spaces. If an object o1 in the ImmortalSpace has a reference to an object o2 in the from-space, then the field needs to be traced and forwarded so that it will continue to reference the same object in the to-space. This will only happen if o1 is reachable from root.

On the contrary, if o1 is not reachable from root, it will not be traced, and it will not be scanned, and its fields will not be forwarded. Then o1 will contain dangling pointers. If we keep a weak reference to o1, mutators may convert it to a strong reference and see dangling pointers. This will crash!

So the current is_live() for ImmortalSpace that always returns true is not suitable for weak reference processing. The current implementation of is_reachable() for ImmortalSpace, on the other hand, returns true only if the object is reached. That's correct w.r.t. weak reference processing.

But then is_retained() will not be a perfect name because we never reclaim any objects in the immortal space. In this sense, all objects in the ImmortalSpace are "retained" (although not all objects in the ImmortalSpace are in good shape).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions