Skip to content

Typed access to metadata #647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Sep 13, 2022
Merged

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Aug 23, 2022

This PR refactors metadata, and makes all its access associated with a num type. This PR also greatly reduces code duplication in the metadata module, and adds tests for both header metadata and side metadata access.

Metadata in the following context refers to both side metadata and header metadata.

  • All metadata access functions are moved to HeaderMetadata/SideMetadata as methods.
  • Introduce the MetadataValue trait so we can handle u8/16/32/64/size uniformly.
  • All metadata access methods needs a number type for the get/set value type. It needs to match what is defined in the spec, otherwise it will panic.
  • Fixe the issue for u64 metadata on 32 bits. It should work properly now.
  • Add unit tests.
  • Header metadata methods in ObjectModel now have a default implementation that uses our header metadata module.

Related PRs:

@qinsoon qinsoon force-pushed the fix-side-metadata-usize branch from 5763187 to 6b8d05f Compare August 23, 2022 02:05
@qinsoon qinsoon force-pushed the fix-side-metadata-usize branch from 6b8d05f to 4a0d8ff Compare August 23, 2022 02:09
@qinsoon qinsoon added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Aug 23, 2022
@qinsoon qinsoon marked this pull request as ready for review August 23, 2022 23:20
@qinsoon qinsoon requested a review from wks August 23, 2022 23:20
@qinsoon
Copy link
Member Author

qinsoon commented Aug 31, 2022

I pushed some draft code here. It is not yet ready for review.

@qinsoon qinsoon removed the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Sep 6, 2022
@qinsoon qinsoon added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Sep 8, 2022
@qinsoon
Copy link
Member Author

qinsoon commented Sep 8, 2022

@wks The PR is ready for review again. The focus of the PR changed and there were some major changes since it was reviewed last time.

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of atomic bit-and and bit-or is that they do not need a cmpxchg loop to implement, so can be more efficient than store or cmpxchg.

Comment on lines 513 to 519
FromPrimitive::from_u8(self.fetch_ops_on_bits(
data_addr,
meta_addr,
order,
order,
|x: u8| x & val.to_u8().unwrap(),
))
Copy link
Collaborator

@wks wks Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_and doesn't need cmpxchg. Just set irrelevant bits to 1 and they will not be affected.

Suggested change
FromPrimitive::from_u8(self.fetch_ops_on_bits(
data_addr,
meta_addr,
order,
order,
|x: u8| x & val.to_u8().unwrap(),
))
let mask = meta_byte_mask(self) << lshift;
let opnd = val | !mask;
unsafe { T::fetch_and(meta_addr, opnd, order) };

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. In case you may be interested, the following is the assembly code of fetch_and() for 1 bit side metadata when using fetch_ops_on_bits (internally uses fetch_update), and directly using fetch_and:

with fetch_ops_on_bits

mmtk::util::alloc_bit::unset_alloc_bit_unsafe:
 mov     rcx, rdi
 shr     rdi, 6
 shr     cl, 3
 movabs  rsi, 13194139533312
 mov     al, byte, ptr, [rdi, +, rsi]
 mov     dl, -2
 rol     dl, cl
.LBB225_1:
 mov     ecx, eax
 and     cl, dl
 lock    cmpxchg, byte, ptr, [rdi, +, rsi], cl
 jne     .LBB225_1
 ret

with fetch_and

mmtk::util::alloc_bit::unset_alloc_bit_unsafe:
 mov     rcx, rdi
 mov     rax, rdi
 shr     rax, 6
 shr     cl, 3
 mov     dl, -2
 rol     dl, cl
 movabs  rcx, 13194139533312
 lock    and, byte, ptr, [rax, +, rcx], dl

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the introduction of fetch_and and fetch_or, some bit operations can be done more efficiently using them, especially set bits and clear bits.

We may change all call sites of store_atomic into fetch_and_atomic or fetch_or_atomic.

Alternatively, we can add a special case for store_atomic if bits_num_log == 1. In this case, we can turn store_atomic internally into fetch_and or fetch_or, depending on whether it is storing 1 or 0.

@@ -7,6 +6,6 @@ use std::sync::atomic::Ordering;
impl VMGlobalLogBitSpec {
/// Mark the log bit as unlogged (1 means unlogged)
pub fn mark_as_unlogged<VM: VMBinding>(&self, object: ObjectReference, order: Ordering) {
store_metadata::<VM>(self, object, 1, None, Some(order))
self.store_atomic::<VM, u8>(object, 1, None, order)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One-bit store_atomic has to use fetch_update or cmpxchg. fetch_or_atomic is more efficient for this.

);
if old_value == value {
return false;
}

if compare_exchange_metadata::<VM>(
&VM::VMObjectModel::LOCAL_MARK_BIT_SPEC,
if VM::VMObjectModel::LOCAL_MARK_BIT_SPEC.compare_exchange_metadata::<VM, u8>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be refactored to use fetch_update.

);
let mark_bit = old_value & mask;
if mark_bit == value {
return false;
}
if compare_exchange_metadata::<VM>(
&VM::VMObjectModel::LOCAL_LOS_MARK_NURSERY_SPEC,
if VM::VMObjectModel::LOCAL_LOS_MARK_NURSERY_SPEC.compare_exchange_metadata::<VM, u8>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be refactored to fetch_update

);
let new_val = old_val & !NURSERY_BIT;
if compare_exchange_metadata::<VM>(
&VM::VMObjectModel::LOCAL_LOS_MARK_NURSERY_SPEC,
if VM::VMObjectModel::LOCAL_LOS_MARK_NURSERY_SPEC.compare_exchange_metadata::<VM, u8>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring to fetch_update.

@@ -316,9 +316,10 @@ impl<VM: VMBinding> MallocSpace<VM> {
address,
);

if !is_marked::<VM>(object, None) {
// TODO: Why do we use non-atomic load here?
if !unsafe { is_marked_unsafe::<VM>(object) } {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bug. If it is not atomic, it is guaranteed to cause data race because tracing is parallel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it should be a benign race, and it should not cause any actual bug. There are some discussions here: #313

Copy link
Collaborator

@wks wks Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When @steveblackburn said "atomic operation" in #313 (comment), I think he meant atomic read-modify-write (RMW) operation (such as cmpxchg, swap, fetch_xxx, etc.), and atomic RMW operations do prevent duplicate edges. In Java, all write operations to 32-bit variables and reference variables (even not volatile) are atomic, in the sense that it disallows word tearing, and if a read does not have happens-before relationship with a write, the read always sees some benign values (written by an actual store). So we seldom use the phrase "atomic read" or "atomic write" in Java.

It is not true for C++ and Rust. In C++ and Rust, if a non-atomic load has no happens-before relationship with a store, it has undefined behaviour. The Rust counterpart of "ordinary" load/store in Java are load_atomic and store_atomic with the Relaxed order (actually the Java memory order is even weaker than Relaxed, but must prevent word tearing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change this to a relaxed atomic load, which is more reasonable.

But just for the sake of argument, I don't think this causes any actual bug. The issue #313 is clear that the if block may be executed multiple times for the same object. So:

  • if is_marked() returns false, but the object is actually marked: we will just execute the block again, and it is fine.
  • if is_marked() returns true, but the object is not actually marked: I don't think this will happen, as we are monotonically set the bit from 0 to 1 at this stage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this to a relaxed atomic load.

ordering,
);
pub fn set_mark_bit<VM: VMBinding>(object: ObjectReference, ordering: Ordering) {
VM::VMObjectModel::LOCAL_MARK_BIT_SPEC.store_atomic::<VM, u8>(object, 1, None, ordering);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a one-bit metadata. May use fetch_or_atomic.

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the return value of cmpxchg...

/// * `success_order`: is the atomic ordering used if the operation is successful.
/// * `failure_order`: is the atomic ordering used if the operation fails.
///
/// # Returns `true` if the operation is successful, and `false` otherwise.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cmpxchg should return the old value and whether it is successful or not. The AtomicU8::compare_exchange method returns Result<u8, u8>. We should do it similarly here. Some algorithm can benefit from it and eliminate a subsequent load.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this to Result<T, T>. For its use cases, I will leave it as compare_exchange().is_ok().

@qinsoon
Copy link
Member Author

qinsoon commented Sep 9, 2022

@wks My plan is to fix whatever is about the implementation the metadata. For the usage of the metadata and any possible optimization on that, I will leave it to another PR.

@qinsoon
Copy link
Member Author

qinsoon commented Sep 12, 2022

This pull request is ready for another review. @wks

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants