Skip to content

Refactor iterate_meta_bits #1181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 6, 2024
Merged

Conversation

wks
Copy link
Collaborator

@wks wks commented Jul 29, 2024

This PR refactors the mechanism for visiting (reading and/or updating) side metadata in bulk, specifically the function SideMetadataSpec::iterate_meta_bits.

The function now uses a single FnMut callback instead of two Fn callbacks. It uses the enum type BitByteRange to distinguish whole byte ranges from bit ranges in a byte. This allows the user to capture variables mutably in the callback. This also removes the Cell used in find_prev_non_zero_value_fast.

The function is made non-recursive to improve the performance. Some test cases are added to test for corner cases.

The function is moved to a dedicated ranges module and renamed to break_bit_range, for several reasons:

  • It was a method of SideMetadataSpec, but it does not access any member of SideMetadataSpec.
  • It needs a non-trivial amount of testing to get corner cases correct, especially after refactoring into a non-recursive function.
  • Related types and functions can be added to the ranges module in the future.
    • Breaking a range of bytes into a range of aligned words and unaligned bytes in the beginning and the end. It will be used by finding VO bits from internal pointers and finding all VO bits in a region (for heap traversal).
    • Breaking a range of bytes at chunk boundaries. It will be used by bulk_update_metadata.

@wks
Copy link
Collaborator Author

wks commented Jul 31, 2024

Here are some benchmark results.

size data bytes meta bytes master master no recursion this PR memset
line 256 4 6.1574 ns 4.1684 ns 4.0134 ns 1.1416 ns
block 32768 512 6.5757 ns 4.5108 ns 4.5009 ns 1.7800 ns

The benchmarks can be found in benches/bulk_meta/bzero_bset.rs, and can be invoked by

taskset -c 0 cargo bench bzero_bset --features bench

Note: You need taskset only if your CPU has heterogeneous cores (so-called "performance-cores" and "efficient-cores").

This PR makes iterate_meta_bits faster than the master branch. That's partly because the existing iterate_meta_bits function is recursive, although it is guaranteed to terminate at the second level of recursion. This PR replaced the recursion with base cases and reduced the cost. I modified the master branch (see 24326a2) to make iterate_meta_bits non-recursive using the same algorithm as this PR, and it's almost as fast as this PR (as seen in the "master no recursion" column in the table above). There may be some subtle difference in the generated code after changing two closures into one closure, but the difference between "master no recusrion" and "this PR" is small.

But neither master nor this PR is as fast as calling memset directly to set the meta bits, and the overhead is huge.

I'll try to rebase #1174 on top of this PR and mark both PRs for review if the refactored code is pleasant for implementing heap traversal with.

@wks wks marked this pull request as ready for review August 6, 2024 06:20
@wks wks requested a review from qinsoon August 6, 2024 06:21
Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring looks good to me. I have some comments about exposing private functions for testing/benchmarking.

Cargo.toml Outdated
@@ -75,6 +75,10 @@ perf_counter = ["pfm"]
# CI scripts run those tests with this feature.
mock_test = []

# This feature is only used for benchmarks.
# It will expose some private functions for benchmarking.
bench = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming this to something like test_private or test_private_items. This would be clear for both the function definitions in mmtk-core, and for the bench/test modules outside mmtk-core.

I found the following code confusing: we are in the bench crate, why do we have the bench feature for some tests but not other tests? A better feature name would make it more clear.

#[cfg(feature = "bench")]
mod bulk_meta;

#[cfg(feature = "mock_test")]
mod mock_bench;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also posssible to have mock_test = ["test_private_item"]. In that case, you don't have to change bench_main and you can simply add your benchmark.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with renaming it to test_private.

The reason for not requiring the "bench" feature for some test cases is because the current mock tests don't require any of the items exposed from the "bench" feature. But it will probably make things clearer to unconditionally require the "bench" or "test_private" feature when doing benchmarks.

benches/main.rs Outdated
// Some benchmarks rely on the "bench" feature to expose some private functions.
// Run them with `cargo bench --features bench`.
#[cfg(feature = "bench")]
bulk_meta::bzero_bset::bench(_c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments about the features. However, if you do want to separate mock_test and bench, you should create a pattern here so others can create new benchmarks with the bench feature easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Criterion benchmarks usually don't use the match statement to select benchmarks. Instead it is similar to running tests. It can be selected on the command line. For example,

cargo bench --features test_private -- bzero

It will select all benchmarks that have "bzero" in the name.

The mock tests are special. Because they rely on a global singleton MockVM, we can only run one test at a time. Previously, we rely on the environment variable MMTK_BENCH to ensure only one mock test benchmark is selected at a time.

At the moment, I just isolated the match statement into the bench_mock function, and leave everything else in bench_main. To install more benchmarks, we simply call c.bench_function, or call functions in other modules (such as bulk_meta::bzero_bset::bench) to install more benchmarks. In the future, the bench_main function will simply call more functions, like these lines: https://github.com/mmtk/mmtk-core/pull/1174/files#diff-9ad1cee13898f4e5ac553eb567cd47a8de7d8852bf292c3d8d5991e23677db32R50-R51

In the future, we can refactor the mock_bench so that we can use the command line to select benchmarks, and as long as we select only one benchmark at a time, they will still be able to run. But if that's impossible, we can still rely on the environment variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now completely isolated mock benches from other benchmarks. Now regular benchmarks are in a separate module regular_bench, and each module inside it has its fn bench method for registering benchmarks. When others add more regular benchmarks, they change the fn bench in the right module; when adding a mock benchmark, they edit mock_bench::bench which now holds the match statement.

pub(super) fn zero_meta_bits(
/// Expose `zero_meta_bits` when running `cargo bench`.
#[cfg(feature = "bench")]
pub fn bench_zero_meta_bits(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may move these functions to a different module, like util::test_util, so we don't need to worry about duplicate function names. Also they won't appear as a disruption when we read the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to just add a top-level module crate::testing because they are not really "utilities for testing". They are just the functions to be tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly concerned that it will appear at the top-level of our docs. But maybe it is okay.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to util::test_private. It does not appear on the first page of the cargo docs.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks wks added this pull request to the merge queue Aug 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 6, 2024
@wks wks added this pull request to the merge queue Aug 6, 2024
Merged via the queue into mmtk:master with commit 8623b77 Aug 6, 2024
24 checks passed
@wks wks deleted the fix/bulk-metadata-visit-simple branch August 6, 2024 15:12
@wks wks mentioned this pull request Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants