Description
Note: Neither the current API nor the new API in #817 prevent us from implementing any language bindings for now. This Issue takes note of a potential problem that may make interfacing with some VMs inconvenient.
An iterator is external if the user drives the iteration, while an iterator is internal if the container drives the iteration. The Rust-style trait Iterator
, the Java-style Iterator<T>
and the C++ style std::iterator
are all external iterators, while the Ruby-style .each {|obj| ...}
method and C++'s std::for_each(begin, end, callback)
are examples of internal iteration.
Currently, our mutator enumeration API uses external iteration. The binding is supposed to hold a global iterator instance. ActivePlan::reset_mutator_iterator
resets the iterator, and ActivePlan::get_next_mutator
gets the next mutator from the global iterator.
The PR #817 changed the API so that the VM binding shall implement an idiomatic Rust iterator type Iterator<Item = Mutator>
. It makes the API cleaner, but the API remains to be external, because the user (mmtk-core) still drives the iteration.
Problem with external iteration as the VM binding API
It is easy to convert an external iterator into internal iteration.
void forEach<T>(Collection<T> collection, Consumer<T> consumer) {
Iterator<T> iter = collection.iterator(); // Use external iterator
while (iter.hasNext()) {
T item = iter.next();
consumer.accept(item); // Call back for internal iteration
}
}
But the opposite (converting internal iteration to external iterator) is difficult.
void main() {
forEach(someCollection, (T item) -> {
// visit item here
});
}
In the snippet above, the item
should be visited inside the lambda. When we are visiting the item, the functions main
, forEach
and the lambda are all being executed, and their execution contexts are on the stack. It is not easy (for Java) to save the execution context. It is not easy for Rust, either.
Note: It is easy, however, for languages that have stackful coroutines, such as Lua, Ruby and (with third-party libraries) C++. For example, Ruby has the Enumerator
class that can transparently convert a callback-style iterating function to a pause-able iterator object where the user can call the .next
method to go to the next item. It is implemented with Ruby's Fiber and which is implemented with swapcontext
.
Therefore, if a VM uses internal iteration to enumerate threads, it will be hard for that VM to implement mutator enumeration using external iterator.
Affected VMs
CRuby
CRuby enumerates threads using ccan_list_for_each
void rb_mmtk_get_mutators(void (*visit_mutator)(RubyMutator *mutator)) {
rb_thread_t *th = NULL;
ccan_list_for_each(&main_ractor->threads.set, th, lt_node) {
visit_mutator(th->mutator);
}
}
In the code snippet above, the ccan_list_for_each
macro enumerates the threads in main_ractor->threads.set
. When visiting each thread, it assigns the thread pointer to the local variable th
, and the user can visit the th
inside the block. ccan_list_for_each
is macro-expanded into a for loop.
If we want to implement external iteration for CRuby, it will require saving the execution context at the visit_mutator
call site, which is difficult. Currently, the mmtk-ruby binding saves all the mutator pointers into an array in the visit_mutator
callback. After ccan_list_for_each
returned, it creates an external iterator that iterates through the saved array. It works for now, but I am not sure if there will be other subtle interactions with the concurrent modification of threads (currently the users are assumed not to create threads during GC).
OpenJDK
OpenJDK provides an external iterator JavaThreadIteratorWithHandle
to iterate over threads. However, it is a StackObj
which means it can only be allocated on the stack. Attempting to allocate it with new
will result in compilation error. This basically forces us to do one of the following:
- Embed the C++ class
JavaThreadIteratorWithHandle
inside theOpenJDKMutatorIterator
struct defined in Rust. - Find a way to circumvent the restriction of
StackObj
and allocateJavaThreadIteratorWithHandle
on the heap.
Method (1) requires the Rust code to duplicate the class layout of the corresponding C++ classes, blurring the Rust-C++ boundary and make the code less maintainable.
Method (2) is violating the semantics of StackObj
. From the comments in threadSMR.hpp in OpenJDK , the class uses stack-allocated objects to ensure that as long as a ThreadsListHandle
is in scope, any JavaThread*
it holds will not be freed even if the corresponding thread logically exits. This indicates it should be safe to allocate For synchronisation, JavaThreadIteratorWithHandle
in the heap as long as we use it in a scoped fashion.Thread
instances also have pointers (in Thread::_threads_list_pointer
, Thread::_threads_hazard_ptr
, etc.) that point back to the data structures (such as SafeThreadsListPtr
and ThreadsList
) in the JavaThreadIteratorWithHandle
. This means the correctness is also sensitive to the movement of the JavaThreadIteratorWithHandle
instance. So it is not safe to violate the restrictions of StackObj
because other code may depend on them.
The safest way is switching to internal iteration.
And we can of course do it like Ruby, i.e. saving all the Mutator*
in an array before making a Rust iterator to iterate over the array. This, however, doesn't prevent threads (and its embedded Mutator
fields) from being destroyed. It should be safe for STW GC, but may surprise us in concurrent GC.
Proposed new API
We can change ActivePlan::mutators
to call a call-back function to visit each mutator instead of returning an Iterator<Mutator>
.
trait ActivePlan<VM> {
fn mutators<F>(visit_mutator: F)
where
F: FnMut(&mut MMTkMutator<VM>);
}
This new API should be easier to implement for VMs like CRuby.
However, one drawback of this API is that it may feel less "idiomatic" for Rust, and it is difficult to composite it with methods of Iterator
, such as .map
, .filter
, .collect
, etc.