Skip to content

Use internal iteration to enumerate mutators #821

Open
@wks

Description

@wks

Note: Neither the current API nor the new API in #817 prevent us from implementing any language bindings for now. This Issue takes note of a potential problem that may make interfacing with some VMs inconvenient.

An iterator is external if the user drives the iteration, while an iterator is internal if the container drives the iteration. The Rust-style trait Iterator, the Java-style Iterator<T> and the C++ style std::iterator are all external iterators, while the Ruby-style .each {|obj| ...} method and C++'s std::for_each(begin, end, callback) are examples of internal iteration.

Currently, our mutator enumeration API uses external iteration. The binding is supposed to hold a global iterator instance. ActivePlan::reset_mutator_iterator resets the iterator, and ActivePlan::get_next_mutator gets the next mutator from the global iterator.

The PR #817 changed the API so that the VM binding shall implement an idiomatic Rust iterator type Iterator<Item = Mutator>. It makes the API cleaner, but the API remains to be external, because the user (mmtk-core) still drives the iteration.

Problem with external iteration as the VM binding API

It is easy to convert an external iterator into internal iteration.

void forEach<T>(Collection<T> collection, Consumer<T> consumer) {
    Iterator<T> iter = collection.iterator(); // Use external iterator
    while (iter.hasNext()) {
        T item = iter.next();
        consumer.accept(item); // Call back for internal iteration
    }
}

But the opposite (converting internal iteration to external iterator) is difficult.

void main() {
    forEach(someCollection, (T item) -> {
        // visit item here
    });
}

In the snippet above, the item should be visited inside the lambda. When we are visiting the item, the functions main, forEach and the lambda are all being executed, and their execution contexts are on the stack. It is not easy (for Java) to save the execution context. It is not easy for Rust, either.

Note: It is easy, however, for languages that have stackful coroutines, such as Lua, Ruby and (with third-party libraries) C++. For example, Ruby has the Enumerator class that can transparently convert a callback-style iterating function to a pause-able iterator object where the user can call the .next method to go to the next item. It is implemented with Ruby's Fiber and which is implemented with swapcontext.

Therefore, if a VM uses internal iteration to enumerate threads, it will be hard for that VM to implement mutator enumeration using external iterator.

Affected VMs

CRuby

CRuby enumerates threads using ccan_list_for_each

void rb_mmtk_get_mutators(void (*visit_mutator)(RubyMutator *mutator)) {
    rb_thread_t *th = NULL;
    ccan_list_for_each(&main_ractor->threads.set, th, lt_node) {
        visit_mutator(th->mutator);
    }
}

In the code snippet above, the ccan_list_for_each macro enumerates the threads in main_ractor->threads.set. When visiting each thread, it assigns the thread pointer to the local variable th, and the user can visit the th inside the block. ccan_list_for_each is macro-expanded into a for loop.

If we want to implement external iteration for CRuby, it will require saving the execution context at the visit_mutator call site, which is difficult. Currently, the mmtk-ruby binding saves all the mutator pointers into an array in the visit_mutator callback. After ccan_list_for_each returned, it creates an external iterator that iterates through the saved array. It works for now, but I am not sure if there will be other subtle interactions with the concurrent modification of threads (currently the users are assumed not to create threads during GC).

OpenJDK

OpenJDK provides an external iterator JavaThreadIteratorWithHandle to iterate over threads. However, it is a StackObj which means it can only be allocated on the stack. Attempting to allocate it with new will result in compilation error. This basically forces us to do one of the following:

  1. Embed the C++ class JavaThreadIteratorWithHandle inside the OpenJDKMutatorIterator struct defined in Rust.
  2. Find a way to circumvent the restriction of StackObj and allocate JavaThreadIteratorWithHandle on the heap.

Method (1) requires the Rust code to duplicate the class layout of the corresponding C++ classes, blurring the Rust-C++ boundary and make the code less maintainable.

Method (2) is violating the semantics of StackObj. From the comments in threadSMR.hpp in OpenJDK , the class uses stack-allocated objects to ensure that as long as a ThreadsListHandle is in scope, any JavaThread* it holds will not be freed even if the corresponding thread logically exits. This indicates it should be safe to allocate JavaThreadIteratorWithHandle in the heap as long as we use it in a scoped fashion. For synchronisation, Thread instances also have pointers (in Thread::_threads_list_pointer, Thread::_threads_hazard_ptr, etc.) that point back to the data structures (such as SafeThreadsListPtr and ThreadsList) in the JavaThreadIteratorWithHandle. This means the correctness is also sensitive to the movement of the JavaThreadIteratorWithHandle instance. So it is not safe to violate the restrictions of StackObj because other code may depend on them.

The safest way is switching to internal iteration.

And we can of course do it like Ruby, i.e. saving all the Mutator* in an array before making a Rust iterator to iterate over the array. This, however, doesn't prevent threads (and its embedded Mutator fields) from being destroyed. It should be safe for STW GC, but may surprise us in concurrent GC.

Proposed new API

We can change ActivePlan::mutators to call a call-back function to visit each mutator instead of returning an Iterator<Mutator>.

trait ActivePlan<VM> {
    fn mutators<F>(visit_mutator: F)
    where
        F: FnMut(&mut MMTkMutator<VM>);
}

This new API should be easier to implement for VMs like CRuby.

However, one drawback of this API is that it may feel less "idiomatic" for Rust, and it is difficult to composite it with methods of Iterator, such as .map, .filter, .collect, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-interfaceArea: Interface/APIC-refactoringCategory: RefactoringF-investigateCall For Participation: Investigate the issue and provide more detailed directionP-lowPriority: Low. A low-priority issue won't be scheduled and assigned. Any help is welcome.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions