Use internal iteration to enumerate mutators

*Note: Neither the current API nor the new API in https://github.com/mmtk/mmtk-core/pull/817 prevent us from implementing any language bindings for now.  This Issue takes note of a potential problem that may make interfacing with some VMs inconvenient.*

An iterator is *external* if the user drives the iteration, while an iterator is *internal* if the container drives the iteration.  The Rust-style `trait Iterator`, the Java-style `Iterator<T>` and the C++ style `std::iterator` are all external iterators, while the Ruby-style `.each {|obj| ...}` method and C++'s `std::for_each(begin, end, callback)` are examples of internal iteration.

Currently, our mutator enumeration API uses external iteration.  The binding is supposed to hold a global iterator instance.  `ActivePlan::reset_mutator_iterator` resets the iterator, and `ActivePlan::get_next_mutator` gets the next mutator from the global iterator.

The PR https://github.com/mmtk/mmtk-core/pull/817 changed the API so that the VM binding shall implement an idiomatic Rust iterator type `Iterator<Item = Mutator>`.  It makes the API cleaner, but the API remains to be external, because the user (mmtk-core) still drives the iteration.

# Problem with external iteration as the VM binding API

It is easy to convert an external iterator into internal iteration.

```java
void forEach<T>(Collection<T> collection, Consumer<T> consumer) {
    Iterator<T> iter = collection.iterator(); // Use external iterator
    while (iter.hasNext()) {
        T item = iter.next();
        consumer.accept(item); // Call back for internal iteration
    }
}
```

But the opposite (converting internal iteration to external iterator) is difficult.

```java
void main() {
    forEach(someCollection, (T item) -> {
        // visit item here
    });
}
```

In the snippet above, the `item` should be visited inside the lambda.  When we are visiting the item, the functions `main`, `forEach` and the lambda are all being executed, and their execution contexts are on the stack.  It is not easy (for Java) to save the execution context.  It is not easy for Rust, either.

*Note: It is easy, however, for languages that have stackful coroutines, such as Lua, Ruby and (with third-party libraries) C++.  For example, Ruby has the `Enumerator` class that can transparently convert a callback-style iterating function to a pause-able iterator object where the user can call the `.next` method to go to the next item.  It is implemented with Ruby's Fiber and which is implemented with `swapcontext`.*

Therefore, if a VM uses internal iteration to enumerate threads, it will be hard for that VM to implement mutator enumeration using external iterator.

# Affected VMs

## CRuby

CRuby enumerates threads using `ccan_list_for_each`

```c
void rb_mmtk_get_mutators(void (*visit_mutator)(RubyMutator *mutator)) {
    rb_thread_t *th = NULL;
    ccan_list_for_each(&main_ractor->threads.set, th, lt_node) {
        visit_mutator(th->mutator);
    }
}
```

In the code snippet above, the `ccan_list_for_each` macro enumerates the threads in `main_ractor->threads.set`.  When visiting each thread, it assigns the thread pointer to the local variable `th`, and the user can visit the `th` inside the block.  `ccan_list_for_each` is macro-expanded into a for loop.

If we want to implement external iteration for CRuby, it will require saving the execution context at the `visit_mutator` call site, which is difficult.  Currently, the mmtk-ruby binding saves all the mutator pointers into an array in the `visit_mutator` callback.  After `ccan_list_for_each` returned, it creates an external iterator that iterates through the saved array.  It works for now, but I am not sure if there will be other subtle interactions with the concurrent modification of threads (currently the users are assumed not to create threads during GC).

## OpenJDK

OpenJDK provides an external iterator `JavaThreadIteratorWithHandle` to iterate over threads.  However, it is a `StackObj` which means it can only be allocated on the stack.  Attempting to allocate it with `new` will result in compilation error.  This basically forces us to do one of the following:

1.  Embed the C++ class `JavaThreadIteratorWithHandle` inside the `OpenJDKMutatorIterator` struct defined in Rust.
2.  Find a way to circumvent the restriction of `StackObj` and allocate `JavaThreadIteratorWithHandle` on the heap.

Method (1) requires the Rust code to duplicate the class layout of the corresponding C++ classes, blurring the Rust-C++ boundary and make the code less maintainable.

Method (2) is violating the semantics of `StackObj`.  From [the comments in threadSMR.hpp in OpenJDK](https://github.com/mmtk/openjdk/blob/master/src/hotspot/share/runtime/threadSMR.hpp#L81) , the class uses stack-allocated objects to ensure that as long as a `ThreadsListHandle` is in scope, any `JavaThread*` it holds will not be freed even if the corresponding thread logically exits.  ~~This indicates it should be safe to allocate `JavaThreadIteratorWithHandle` in the heap as long as we use it in a scoped fashion.~~ For synchronisation, `Thread` instances also have pointers (in `Thread::_threads_list_pointer`, `Thread::_threads_hazard_ptr`, etc.) that point back to the data structures (such as `SafeThreadsListPtr` and `ThreadsList`) in the `JavaThreadIteratorWithHandle`.  This means the correctness is also sensitive to the movement of the `JavaThreadIteratorWithHandle` instance.  So it is not safe to violate the restrictions of `StackObj` because other code may depend on them.

The safest way is switching to internal iteration.

And we can of course do it like Ruby, i.e. saving all the `Mutator*` in an array before making a Rust iterator to iterate over the array.  This, however, doesn't prevent threads (and its embedded `Mutator` fields) from being destroyed.  It should be safe for STW GC, but may surprise us in concurrent GC.

# Proposed new API

We can change `ActivePlan::mutators` to call a call-back function to visit each mutator instead of returning an `Iterator<Mutator>`.

```rs
trait ActivePlan<VM> {
    fn mutators<F>(visit_mutator: F)
    where
        F: FnMut(&mut MMTkMutator<VM>);
}
```

This new API should be easier to implement for VMs like CRuby.

However, one drawback of this API is that it may feel less "idiomatic" for Rust, and it is difficult to composite it with methods of `Iterator`, such as `.map`, `.filter`, `.collect`, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use internal iteration to enumerate mutators #821

Problem with external iteration as the VM binding API

Affected VMs

CRuby

OpenJDK

Proposed new API

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use internal iteration to enumerate mutators #821

Description

Problem with external iteration as the VM binding API

Affected VMs

CRuby

OpenJDK

Proposed new API

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions