Skip to content

Add fetch_xxx and fetch_update API for metadata. #651

Closed
@wks

Description

@wks

Proposed API

  • Atomic bit operations:
    • fetch_and_metadata(metadata_spec, object, val, order): atomically and
      • Maybe also provide fetch_clear_metadata(m, o, v, order) as an alias of fetch_and(m, o, !v, order)
    • fetch_or_metadata(metadata_spec, object, val, order): atomically or
      • Maybe also provide fetch_set_metadata(m, o, v, order) as a synonym
    • fetch_xor_metadata(metadata_spec, object, val, order): atomically xor
  • fetch_update_metadata(metadata_spec, object, |old| { compute(old) }, order1, order2): Update the update atomically using a closure.

Problem with compare_exchange

It needs a loop, and is somewhat complex when modifying only some bits of a byte.

Compare-exchange is usually used in a loop:

  1. Load the old value
  2. Compute the new value
  3. Use compare_exchange to try to update it from the old value to the new value
    • If we only update some bits of a byte, we need another load here.
  4. Retry if failed

Some details can be seen in the discussion in #644

Alternatives

On the architecture level,

  1. Some platforms have instructions that does atomic add/sub/and/or/nor, but never fails (i.e. no need to use it in a loop).
  2. Some platforms (almost all platforms except x86) use LL-SC as the primitive instead of compare-exchange. On such platforms, compare-exchange itself is usually implemented with LL-SC.

In Rust,

  1. AtomicU{8,16,32,64,size} provide fetch_{add,sub,and,nand,or,xor,min,max}. (Obviously they are wrappers of LLVM's atomicrmw instruction.)
  2. AtomicU{8,16,32,64,size} provide fetch_update that takes a call-back function that computes the new value from the old.

Using non-failing fetch_xxx operations for bit operations

There are two advantages of using fetch_and, fetch_or and fetch_xor instead of compare_exchange.

  1. They never fail, so we don't need to run in a loop.
  2. Unlike add and sub, those bit operations do not affect unrelated bits, so we don't need to implement them with compare-exchange.

For example, if a metadata is one bit per object, and we want to set the n-th bit of an AtomicU8, we just call x.fetch_or(1 << n, order).

If a metadata is two bits per object, and we want to AND bit n and bit n+1 with a user-supplied value v, we do:

assert_eq!(v & !0b11, 0, "v has more than 2 bits");
x.fetch_and(v << n, order);

As long as the input v has all 0 bits above the lowest two bits, we are sure x.fetch_and won't affect any other bits.

Using fetch_and, the compare-exchange in the object remembering barrier

compare_exchange_metadata::<E::VM>(&self.meta, object, 1, 0, None, Ordering::SeqCst, Ordering::SeqCst)

can be re-written as

fetch_and_metadata::<E::VM>(&self.meta, object, 0, Ordering::SeqCst)

and it doesn't need a loop.

Using fetch_update for other atomic read-modify-write operations.

We can design the API similar to Rust's fetch_update of AtomicU8 types.

One advantage is that the compiler can implement AtomicU8::fetch_update as an LL-SC loop, without needing a separate CAS instruction or instruction sequence. I have seen the Rust compiler doing this on RISC-V.

The other is that we can omit a load. Because fetch_update surrounds the closure, it can load the old value before running the closure, and reuse the loaded value after running the closure.

For example, with this API, even if we still use compare-exchange internally, we can have one less load.

fn fetch_update_bits(&x: AtomicU8, bits: u8, shift: u8, update: Fn(u8) -> u8)
    loop {
        let old_byte = x.load();  // We are going to reuse this.
        let mask = (1 << bits) - 1;
        let old_interesting_bits = (old_bytes >> shift) & mask;
        let new_interesting_bits = update(old_interesting_bits);
        assert_eq!(new_interesting_bits & !mask, 0, "new value has two many bits");
        let new_byte = (old_byte & !(mask << shift)) | (new_interesting << shift); // Reusing old_byte here.
        if x.compare_exchange(old_byte, new_byte) { break; }
    }        
}

We can also just use AtomicU8.fetch_update.

fn fetch_update_bits(&x: AtomicU8, bits: u8, shift: u8, update: Fn(u8) -> u8)
    x.fetch_update(|old_byte| {
        let mask = (1 << bits) - 1;
        let old_interesting_bits = (old_bytes >> shift) & mask;
        let new_interesting_bits = update(old_interesting_bits);
        assert_eq!(new_interesting_bits & !mask, 0, "new value has two many bits");
        let new_byte = (old_byte & !(mask << shift)) | (new_interesting << shift);  // Not loading again.
        Some(new_byte)
    }).unwrap();
}

Using this API, the object-remembering barrier can be implemented as

fetch_update_metadata::<E::VM>(&self.meta, object, |_old| { 0 }, Ordering::SeqCst, Ordering::SeqCst)

But fetch_and should be preferred for better efficiency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions