Skip to content

Special topic chapter for finalizers and weak references #1265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/userguide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

[Introduction](README.md)

[Glossary](glossary.md)

# For GC Developers

- [Tutorial: Add a new GC plan to MMTk](tutorial/prefix.md)
Expand Down Expand Up @@ -36,6 +38,8 @@
- [Performance Tuning](portingguide/perf_tuning/prefix.md)
- [Link Time Optimization](portingguide/perf_tuning/lto.md)
- [Optimizing Allocation](portingguide/perf_tuning/alloc.md)
- [VM-specific Concerns](portingguide/concerns/prefix.md)
- [Finalizers and Weak References](portingguide/concerns/weakref.md)
- [API Migration Guide](migration/prefix.md)
- [Template (for mmtk-core developers)](migration/template.md)

Expand Down
52 changes: 52 additions & 0 deletions docs/userguide/src/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Glossary

This document explains basic concepts of garbage collection. MMTk uses those terms as described in
this document. Different VMs may define some terms differently. Should there be any confusion,
this document will help disambiguating them. We use the book [*The Garbage Collection Handbook: The
Art of Automatic Memory Management*][GCHandbook] as the primary reference.

[GCHandbook]: https://gchandbook.org/

## Object graph

Object graph is a graph-theory view of the garbage-collected heap. An **object graph** is a
directed graph that contains *nodes* and *edges*. An edge always points to a node. But unlike
conventional graphs, an edge may originate from either another node or a *root*.

Each *node* represents an object in the heap.

Each *edge* represents an object reference from an object or a root. A *root* is a reference held
in a slot directly accessible from [mutators][mutator], including local variables, global variables,
thread-local variables, and so on. A object can have many fields, and some fields may hold
references to objects, while others hold non-reference values.

An object is *reachable* if there is a path in the object graph from any root to the node of the
object. Unreachable objects cannot be accessed by [mutators][mutator]. They are considered
garbage, and can be reclaimed by the garbage collector.

[mutator]: #mutator

## Mutator

TODO

## Emergency Collection

Also known as: *emergency GC*

In MMTk, an emergency collection happens when a normal collection cannot reclaim enough memory to
satisfy allocation requests. Plans may do full-heap GC, defragmentation, etc. during emergency
collections in order to free up more memory.

VM bindings can call `MMTK::is_emergency_collection` to query if the current GC is an emergency GC.
During emergency GC, the VM binding is recommended to retain fewer objects than normal GCs, to the
extent allowed by the specification of the VM or the language. For example, the VM binding may
choose not to retain objects used for caching. Specifically, for Java virtual machines, that means
not retaining referents of [`SoftReference`][java-soft-ref] which is primarily designed for
implementing memory-sensitive caches.

[java-soft-ref]: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html

<!--
vim: tw=100 ts=4 sw=4 sts=4 et
-->
5 changes: 5 additions & 0 deletions docs/userguide/src/portingguide/concerns/prefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# VM-specific Concerns

Every VM is special in some way. Because of this, some VM bindings may use MMTk features not
usually used by most VMs, and may even deviate from the usual steps of integrating MMTk into the VM.
Here we provide special guides to cover such cases.
484 changes: 484 additions & 0 deletions docs/userguide/src/portingguide/concerns/weakref.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions src/mmtk.rs
Original file line number Diff line number Diff line change
Expand Up @@ -382,13 +382,13 @@ impl<VM: VMBinding> MMTK<VM> {
/// Return true if the current GC is an emergency GC.
///
/// An emergency GC happens when a normal GC cannot reclaim enough memory to satisfy allocation
/// requests. Plans may do full-heap GC, defragmentation, etc. during emergency in order to
/// requests. Plans may do full-heap GC, defragmentation, etc. during emergency GCs in order to
/// free up more memory.
///
/// VM bindings can call this function during GC to check if the current GC is an emergency GC.
/// If it is, the VM binding is recommended to retain fewer objects than normal GCs, to the
/// extent allowed by the specification of the VM or langauge. For example, the VM binding may
/// choose not to retain objects used for caching. Specifically, for Java virtual machines,
/// extent allowed by the specification of the VM or the language. For example, the VM binding
/// may choose not to retain objects used for caching. Specifically, for Java virtual machines,
/// that means not retaining referents of [`SoftReference`][java-soft-ref] which is primarily
/// designed for implementing memory-sensitive caches.
///
Expand Down
31 changes: 31 additions & 0 deletions src/util/address.rs
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,37 @@ impl ObjectReference {
}

/// Is the object reachable, determined by the policy?
///
/// # Scope
///
/// This method is primarily used during weak reference processing. It can check if an object
/// (particularly finalizable objects and objects pointed by weak references) has been reached
/// by following strong references or weak references of higher strength.
///
/// This method can also be used during tracing for debug purposes.
///
/// When called at other times, particularly during mutator time, the behavior is specific to
/// the implementation of the plan and policy due to their strategies of metadata clean-up. If
/// the VM needs to know if any given reference is still valid, it should instead use the valid
/// object bit (VO-bit) metadata which is enabled by the Cargo feature "vo_bit".
///
/// # Return value
///
/// It returns `true` if one of the following is true:
///
/// 1. The object has been traced (i.e. reached) since tracing started.
/// 2. The policy conservatively considers the object reachable even though it has not been
/// traced.
/// - Particularly, if the plan is generational, this method will return `true` if the
/// object is mature during nursery GC.
///
/// Due to the conservativeness, if this method returns `true`, it does not necessarily mean the
/// object must be reachable from roots. In generational GC, mature objects can be unreachable
/// from roots while the GC chooses not to reclaim their memory during nursery GC. Conversely,
/// all young objects reachable from the remembered set are retained even though some mature
/// objects in the remembered set can be unreachable in the first place. (This is known as
/// *nepotism* in GC literature.)
///
/// Note: Objects in ImmortalSpace may have `is_live = true` but are actually unreachable.
pub fn is_reachable(self) -> bool {
unsafe { SFT_MAP.get_unchecked(self.to_raw_address()) }.is_reachable(self)
Expand Down
121 changes: 76 additions & 45 deletions src/vm/scanning.rs
Original file line number Diff line number Diff line change
Expand Up @@ -282,64 +282,95 @@ pub trait Scanning<VM: VMBinding> {

/// Process weak references.
///
/// This function is called after a transitive closure is completed.
/// This function is called in a GC after the transitive closure from roots is computed, that
/// is, all reachable objects from roots are reached. This function gives the VM binding an
/// opportunitiy to process finalizers and weak references.
///
/// MMTk core enables the VM binding to do the following in this function:
///
/// 1. Query if an object is already reached in this transitive closure.
/// 1. Query if an object is already reached.
/// - by calling `ObjectReference::is_reachable()`
/// 2. Get the new address of an object if it is already reached.
/// - by calling `ObjectReference::get_forwarded_object()`
/// 3. Keep an object and its descendents alive if not yet reached.
/// - using `tracer_context`
/// 4. Request this function to be called again after transitive closure is finished again.
///
/// The VM binding can query if an object is currently reached by calling
/// `ObjectReference::is_reachable()`.
///
/// If an object is already reached, the VM binding can get its new address by calling
/// `ObjectReference::get_forwarded_object()` as the object may have been moved.
///
/// If an object is not yet reached, the VM binding can keep that object and its descendents
/// alive. To do this, the VM binding should use `tracer_context.with_tracer` to get access to
/// an `ObjectTracer`, and then call its `trace_object(object)` method. The `trace_object`
/// method will return the new address of the `object` if it moved the object, or its original
/// address if not moved. Implementation-wise, the `ObjectTracer` may contain an internal
/// queue for newly traced objects, and will flush the queue when `tracer_context.with_tracer`
/// returns. Therefore, it is recommended to reuse the `ObjectTracer` instance to trace
/// multiple objects.
///
/// *Note that if `trace_object` is called on an already reached object, the behavior will be
/// equivalent to `ObjectReference::get_forwarded_object()`. It will return the new address if
/// the GC already moved the object when tracing that object, or the original address if the GC
/// did not move the object when tracing it. In theory, the VM binding can use `trace_object`
/// wherever `ObjectReference::get_forwarded_object()` is needed. However, if a VM never
/// resurrects objects, it should completely avoid touching `tracer_context`, and exclusively
/// use `ObjectReference::get_forwarded_object()` to get new addresses of objects. By doing
/// so, the VM binding can avoid accidentally resurrecting objects.*
///
/// The VM binding can return `true` from `process_weak_refs` to request `process_weak_refs`
/// to be called again after the MMTk core finishes transitive closure again from the objects
/// newly visited by `ObjectTracer::trace_object`. This is useful if a VM supports multiple
/// levels of reachabilities (such as Java) or ephemerons.
///
/// Implementation-wise, this function is called as the "sentinel" of the `VMRefClosure` work
/// bucket, which means it is called when all work packets in that bucket have finished. The
/// `tracer_context` expands the transitive closure by adding more work packets in the same
/// bucket. This means if `process_weak_refs` returns true, those work packets will have
/// finished (completing the transitive closure) by the time `process_weak_refs` is called
/// again. The VM binding can make use of this by adding custom work packets into the
/// `VMRefClosure` bucket. The bucket will be `VMRefForwarding`, instead, when forwarding.
/// See below.
/// - by returning `true`
///
/// The `tracer_context` parameter provides the VM binding the mechanism for retaining
/// unreachable objects (i.e. keeping them alive in this GC). The following snippet shows a
/// typical use case of handling finalizable objects for a Java-like language.
///
/// ```rust
/// let finalizable_objects: Vec<ObjectReference> = my_vm::get_finalizable_object();
/// let mut new_finalizable_objects = vec![];
///
/// tracer_context.with_tracer(worker, |tracer| {
/// for object in finalizable_objects {
/// if object.is_reachable() {
/// // `object` is still reachable.
/// // It may have been moved if it is a copying GC.
/// let new_object = object.get_forwarded_object().unwrap_or(object);
/// new_finalizable_objects.push(new_object);
/// } else {
/// // `object` is unreachable.
/// // Retain it, and enqueue it for postponed finalization.
/// let new_object = tracer.trace_object(object);
/// my_vm::enqueue_finalizable_object_to_be_executed_later(new_object);
/// }
/// }
/// });
/// ```
///
/// Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)`
/// to retain `object` and get its new address if moved. After `with_tracer` returns, it will
/// create work packets in the `VMRefClosure` work bucket to compute the transitive closure from
/// the objects retained in the closure.
///
/// The `memory_manager::is_mmtk_object` function can be used in this function if
/// - the "is_mmtk_object" feature is enabled, and
/// - `VM::VMObjectModel::NEED_VO_BITS_DURING_TRACING` is true.
///
/// Arguments:
/// * `worker`: The current GC worker.
/// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to retain and
/// update weak references.
///
/// This function shall return true if this function needs to be called again after the GC
/// finishes expanding the transitive closure from the objects kept alive.
/// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to retain and update
/// weak references.
///
/// If `process_weak_refs` returns `true`, then `process_weak_refs` will be called again after
/// all work packets in the `VMRefClosure` work bucket has been executed, by which time all
/// objects reachable from the objects retained in this function will have been reached.
///
/// # Performance notes
///
/// **Retain as many objects as needed in one invocation of `tracer_context.with_tracer`, and
/// avoid calling `with_tracer` again and again** for each object. The `tracer` provided by
/// `ObjectTracerFactory::with_tracer` enqueues retained objects in an internal list specific to
/// this invocation of `with_tracer`, and will create reasonably sized work packets to compute
/// the transitive closure. This means the invocation of `with_tracer` has a non-trivial
/// overhead, but each invocation of `tracer.trace_object` is cheap.
///
/// *Don't do this*:
///
/// ```rust
/// for object in objects {
/// tracer_context.with_tracer(worker, |tracer| { // This is expensive! DON'T DO THIS!
/// tracer.trace_object(object);
/// });
/// }
/// ```
///
/// **Use `ObjectReference::get_forwarded_object()` to get the forwarded address of reachable
/// objects. Only use `tracer.trace_object` for retaining unreachable objects.** If
/// `trace_object` is called on an already reached object, it will also return its new address
/// if moved. However, `tracer_context.with_tracer` has a cost, and the VM binding may
/// accidentally "resurrect" dead objects if failed to check `object.is_reachable()` first. If
/// the VM binding does not intend to retain any objects, it should completely avoid touching
/// `tracer_context`.
///
/// **Clone the `tracer_context` for parallelism.** The `ObjectTracerContext` has `Clone` as
/// its supertrait. The VM binding can clone it and distribute each clone into a work packet.
/// By doing so, the VM binding can parallelize the processing of finalizers and weak references
/// by creating multiple work packets.
fn process_weak_refs(
_worker: &mut GCWorker<VM>,
_tracer_context: impl ObjectTracerContext<VM>,
Expand Down
101 changes: 101 additions & 0 deletions src/vm/tests/mock_tests/mock_test_doc_weakref_code_example.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
//! This module tests the example code in `Scanning::process_weak_refs` and `weakref.md` in the
//! Porting Guide. We only check if the example code compiles. We cannot actually run it because
//! we can't construct a `GCWorker`.

use crate::{
scheduler::GCWorker,
util::ObjectReference,
vm::{ObjectTracer, ObjectTracerContext, Scanning, VMBinding},
};

use super::mock_test_prelude::MockVM;

#[allow(dead_code)] // We don't construct this struct as we can't run it.
struct VMScanning;

// Just to make the code example look better.
use MockVM as MyVM;

// Placeholders for functions supposed to be implemented byu the VM.
mod my_vm {
use crate::util::ObjectReference;

pub fn get_finalizable_object() -> Vec<ObjectReference> {
unimplemented!()
}

pub fn set_new_finalizable_objects(_objects: Vec<ObjectReference>) {}

pub fn enqueue_finalizable_object_to_be_executed_later(_object: ObjectReference) {}
}

// ANCHOR: process_weak_refs_finalization
impl Scanning<MyVM> for VMScanning {
fn process_weak_refs(
worker: &mut GCWorker<MyVM>,
tracer_context: impl ObjectTracerContext<MyVM>,
) -> bool {
let finalizable_objects: Vec<ObjectReference> = my_vm::get_finalizable_object();
let mut new_finalizable_objects = vec![];

tracer_context.with_tracer(worker, |tracer| {
for object in finalizable_objects {
if object.is_reachable() {
// `object` is still reachable.
// It may have been moved if it is a copying GC.
let new_object = object.get_forwarded_object().unwrap_or(object);
new_finalizable_objects.push(new_object);
} else {
// `object` is unreachable.
// Retain it, and enqueue it for postponed finalization.
let new_object = tracer.trace_object(object);
my_vm::enqueue_finalizable_object_to_be_executed_later(new_object);
}
}
});

my_vm::set_new_finalizable_objects(new_finalizable_objects);

false
}

// ...
// ANCHOR_END: process_weak_refs_finalization

// Methods after this are placeholders. We only ensure they compile.

fn scan_object<SV: crate::vm::SlotVisitor<<MockVM as VMBinding>::VMSlot>>(
_tls: crate::util::VMWorkerThread,
_object: ObjectReference,
_slot_visitor: &mut SV,
) {
unimplemented!()
}

fn notify_initial_thread_scan_complete(_partial_scan: bool, _tls: crate::util::VMWorkerThread) {
unimplemented!()
}

fn scan_roots_in_mutator_thread(
_tls: crate::util::VMWorkerThread,
_mutator: &'static mut crate::Mutator<MockVM>,
_factory: impl crate::vm::RootsWorkFactory<<MockVM as VMBinding>::VMSlot>,
) {
unimplemented!()
}

fn scan_vm_specific_roots(
_tls: crate::util::VMWorkerThread,
_factory: impl crate::vm::RootsWorkFactory<<MockVM as VMBinding>::VMSlot>,
) {
unimplemented!()
}

fn supports_return_barrier() -> bool {
unimplemented!()
}

fn prepare_for_roots_re_scanning() {
unimplemented!()
}
}
1 change: 1 addition & 0 deletions src/vm/tests/mock_tests/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,4 @@ mod mock_test_vm_layout_log_address_space;

mod mock_test_doc_avoid_resolving_allocator;
mod mock_test_doc_mutator_storage;
mod mock_test_doc_weakref_code_example;