-
-
Notifications
You must be signed in to change notification settings - Fork 723
test(parser): track number of allocations #12555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(parser): track number of allocations #12555
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Instrumentation Performance ReportMerging #12555 will not alter performanceComparing Summary
Footnotes |
3d4ff45 to
9066ffa
Compare
6b10fc7 to
04a3ca5
Compare
04a3ca5 to
1a4d231
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great.
In addition to the comments below:
There are also many arena allocations that occur without going through the Allocator APIs. Notably, Vec performs allocations and reallocations directly via Bump, by going through the Alloc trait (alloc.alloc() and alloc.grow() calls).
oxc/crates/oxc_allocator/src/vec2/raw_vec.rs
Lines 93 to 115 in 5f50bc3
| pub fn with_capacity_in(cap: usize, alloc: &'a A) -> Self { | |
| unsafe { | |
| let elem_size = mem::size_of::<T>(); | |
| let alloc_size = cap.checked_mul(elem_size).unwrap_or_else(|| capacity_overflow()); | |
| alloc_guard(alloc_size).unwrap_or_else(|_| capacity_overflow()); | |
| // handles ZSTs and `cap = 0` alike | |
| let ptr = if alloc_size == 0 { | |
| NonNull::<T>::dangling() | |
| } else { | |
| let align = mem::align_of::<T>(); | |
| let layout = Layout::from_size_align(alloc_size, align).unwrap(); | |
| alloc.alloc(layout).cast::<T>() | |
| }; | |
| // `cap as u32` is safe because `alloc_guard` ensures that `cap` | |
| // cannot exceed `u32::MAX`. | |
| #[expect(clippy::cast_possible_truncation)] | |
| let cap = cap as u32; | |
| RawVec { ptr, alloc, cap, len: 0 } | |
| } | |
| } |
oxc/crates/oxc_allocator/src/vec2/raw_vec.rs
Lines 742 to 766 in 5f50bc3
| fn finish_grow(&self, new_layout: Layout) -> Result<NonNull<u8>, AllocError> { | |
| alloc_guard(new_layout.size())?; | |
| let new_ptr = match self.current_layout() { | |
| Some(layout) => unsafe { | |
| // Marking this function as `#[cold]` and `#[inline(never)]` because grow method is | |
| // relatively expensive and we want to avoid inlining it into the caller. | |
| #[cold] | |
| #[inline(never)] | |
| unsafe fn grow<T, A: Alloc>( | |
| alloc: &A, | |
| ptr: NonNull<T>, | |
| old_layout: Layout, | |
| new_layout: Layout, | |
| ) -> NonNull<u8> { | |
| alloc.grow(ptr.cast(), old_layout, new_layout) | |
| } | |
| debug_assert!(new_layout.align() == layout.align()); | |
| grow(self.alloc, self.ptr, layout, new_layout) | |
| }, | |
| None => self.alloc.alloc(new_layout), | |
| }; | |
| Ok(new_ptr) | |
| } |
These allocations / reallocations aren't getting counted, and they're a significant proportion of arena allocations.
So the ideal would be to increment num_alloc in Alloc::alloc + Alloc::grow as well. But problem is that in those methods you only have access to the Bump, not the Allocator.
A terrible hack would be:
impl Alloc for Bump {
#[inline(always)]
fn alloc(&self, layout: Layout) -> NonNull<u8> {
#[cfg(feature = "track_allocations")]
unsafe {
use crate::Allocator;
use std::{
mem::offset_of,
ptr,
sync::atomic::{AtomicUsize, Ordering},
};
#[expect(clippy::cast_possible_wrap)]
const OFFSET: isize = (offset_of!(Allocator, num_alloc) as isize)
- (offset_of!(Allocator, bump) as isize);
let num_alloc_ptr = ptr::from_ref(self).byte_offset(OFFSET).cast::<AtomicUsize>();
let num_alloc = num_alloc_ptr.as_ref().unwrap_unchecked();
num_alloc.fetch_add(1, Ordering::SeqCst);
}
self.alloc_layout(layout)
}
}This is completely unsound! It relies on Bump always being wrapped inside an Allocator, which can't be statically proven. But in practice we know that's always the case in Oxc - we never use Bump on its own. This unsoundness would be unacceptable in production code, but as it's behind a cargo feature which is only used in this internal tool, I think it'd be OK.
When I finally finish up replacing Bump with our own custom allocator (WIP) we could build allocation counting into it, and remove this hack.
Note: Counting the number and total byte size of arena reallocations separately would be a really good measure. Allocation in arena is very cheap, but reallocation is way more expensive (see comment below). Of the stats for arena allocations, that's one of the measures which should be a priority to reduce.
Did you check that this is still the case once you added the warmup? If it still differs across platforms, I'd be interested to know by how much, and on what platforms. And is there also variance on the same platform across multiple runs? I'd have broadly expected this measure to be deterministic. If it's not, the reason why might also shed some light on some of the variance we have in our benchmarks. It seems to relate to non-deterministic allocation patterns in |
Yes, it would vary between platforms, even if I switched to
I also expected it to be the same, given that it's the same input, and the same parsing code, but evidently there are other factors I'm not considering. The number of allocations appears to be affected by the platform it runs on, the order that files are parsed in, and the allocator being used as well. |
I think I can live with this, if you're okay with a little bit of extra feature-gated code in these methods. I think the effort will be worth it, as this will help us unlock a new avenue of performance optimization as well as preventing memory/performance regressions.
This sounds like a great path forward to me. |
I can live with monstrous unsoundness, if you can! 😄 I agree it's a metric we can likely leverage to make improvements on perf. |
|
Oh, just one more thing. There's an annoyance with cargo features. We run tests with The crappy "solution" we've found for feature-gating certain code that's only used for oxlint JS plugins is to have 2 features called #[cfg(all(feature = "oxlint2", not(feature = "disable_oxlint2")))]
fn gated() { /* ... */ }So This is crap. But it's the best we've come up with so far. So unfortunately I suggest using the same hack here in To double-protect against hack code being compiled in production builds, I'd recommend adding |
1a4d231 to
e15fd34
Compare
Done. However, I wasn't able to add |
e3ab989 to
3e8c580
Compare
d505527 to
5a84c8b
Compare
5a84c8b to
a2cf46a
Compare
a2cf46a to
1a29424
Compare
1a29424 to
21db624
Compare
Merge activity
|
- related: oxc-project/backlog#5 This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect. Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.
21db624 to
d79f4ec
Compare
Follow-on after #12555. Avoid making `bump` field of `Allocator` public when `track_allocations` feature is enabled, by moving the field offset calculations to next to `Allocator`'s definition.
…12937) Follow-on after #12555. Pure refactor. Move code related to allocation tracking into its own module, and introduce an `AllocationStats` struct. This reduces noise in `allocator.rs` and `alloc.rs`, with less repetitions of `#[cfg(all(feature = "track_allocations", not(feature = "disable_track_allocations")))]` etc. Also bulk out the doc comments explaining the unsoundness, and how we must take care to avoid allocation tracking code being compiled in production code.
- related: oxc-project/backlog#5 This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect. Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.
…project#12936) Follow-on after oxc-project#12555. Avoid making `bump` field of `Allocator` public when `track_allocations` feature is enabled, by moving the field offset calculations to next to `Allocator`'s definition.
…xc-project#12937) Follow-on after oxc-project#12555. Pure refactor. Move code related to allocation tracking into its own module, and introduce an `AllocationStats` struct. This reduces noise in `allocator.rs` and `alloc.rs`, with less repetitions of `#[cfg(all(feature = "track_allocations", not(feature = "disable_track_allocations")))]` etc. Also bulk out the doc comments explaining the unsoundness, and how we must take care to avoid allocation tracking code being compiled in production code.
…ocationStats` (#13043) `AllocationStats` (introduced in #12555 and #12937) previously had to contain `AtomicUsize`s because `Allocator` was `Sync`. #13033 removed the `Sync` impl for `Allocator`, so now there's no need for synchronization in `AllocationStats`, and these fields can be `Cell<usize>` instead.

This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect.
Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.