test(parser): track number of allocations #12555

camchenry · 2025-07-28T04:22:43Z

related: Add wallclock benchmarks backlog#5

This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect.

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.

camchenry · 2025-07-28T04:23:01Z

test(parser): track number of allocations #12555 : 2 dependent PRs (#12701 , #12837 ) 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-07-28T04:27:55Z

CodSpeed Instrumentation Performance Report

Merging #12555 will not alter performance

_{Comparing 07-28-test_parser_track_number_of_allocations (d79f4ec) with main (d93e373)¹}

Summary

✅ 34 untouched benchmarks

No successful run was found on main (d79f4ec) during the generation of this report, so d93e373 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

crates/oxc_allocator/src/allocator.rs

overlookmotel

This is great.

In addition to the comments below:

There are also many arena allocations that occur without going through the Allocator APIs. Notably, Vec performs allocations and reallocations directly via Bump, by going through the Alloc trait (alloc.alloc() and alloc.grow() calls).

oxc/crates/oxc_allocator/src/vec2/raw_vec.rs

Lines 93 to 115 in 5f50bc3

    
           pub fn with_capacity_in(cap: usize, alloc: &'a A) -> Self { 
        
               unsafe { 
        
                   let elem_size = mem::size_of::<T>(); 
        
                   let alloc_size = cap.checked_mul(elem_size).unwrap_or_else(|| capacity_overflow()); 
        
                   alloc_guard(alloc_size).unwrap_or_else(|_| capacity_overflow()); 
        
                   // handles ZSTs and `cap = 0` alike 
        
                   let ptr = if alloc_size == 0 { 
        
                       NonNull::<T>::dangling() 
        
                   } else { 
        
                       let align = mem::align_of::<T>(); 
        
                       let layout = Layout::from_size_align(alloc_size, align).unwrap(); 
        
                       alloc.alloc(layout).cast::<T>() 
        
                   }; 
        
                   // `cap as u32` is safe because `alloc_guard` ensures that `cap` 
        
                   // cannot exceed `u32::MAX`. 
        
                   #[expect(clippy::cast_possible_truncation)] 
        
                   let cap = cap as u32; 
        
                   RawVec { ptr, alloc, cap, len: 0 } 
        
               } 
        
           }

oxc/crates/oxc_allocator/src/vec2/raw_vec.rs

Lines 742 to 766 in 5f50bc3

    
           fn finish_grow(&self, new_layout: Layout) -> Result<NonNull<u8>, AllocError> { 
        
               alloc_guard(new_layout.size())?; 
        
               let new_ptr = match self.current_layout() { 
        
                   Some(layout) => unsafe { 
        
                       // Marking this function as `#[cold]` and `#[inline(never)]` because grow method is 
        
                       // relatively expensive and we want to avoid inlining it into the caller. 
        
                       #[cold] 
        
                       #[inline(never)] 
        
                       unsafe fn grow<T, A: Alloc>( 
        
                           alloc: &A, 
        
                           ptr: NonNull<T>, 
        
                           old_layout: Layout, 
        
                           new_layout: Layout, 
        
                       ) -> NonNull<u8> { 
        
                           alloc.grow(ptr.cast(), old_layout, new_layout) 
        
                       } 
        
                       debug_assert!(new_layout.align() == layout.align()); 
        
                       grow(self.alloc, self.ptr, layout, new_layout) 
        
                   }, 
        
                   None => self.alloc.alloc(new_layout), 
        
               }; 
        
               Ok(new_ptr) 
        
           }

These allocations / reallocations aren't getting counted, and they're a significant proportion of arena allocations.

So the ideal would be to increment num_alloc in Alloc::alloc + Alloc::grow as well. But problem is that in those methods you only have access to the Bump, not the Allocator.

A terrible hack would be:

impl Alloc for Bump {
    #[inline(always)]
    fn alloc(&self, layout: Layout) -> NonNull<u8> {
        #[cfg(feature = "track_allocations")]
        unsafe {
            use crate::Allocator;
            use std::{
                mem::offset_of,
                ptr,
                sync::atomic::{AtomicUsize, Ordering},
            };
            #[expect(clippy::cast_possible_wrap)]
            const OFFSET: isize = (offset_of!(Allocator, num_alloc) as isize)
                - (offset_of!(Allocator, bump) as isize);
            let num_alloc_ptr = ptr::from_ref(self).byte_offset(OFFSET).cast::<AtomicUsize>();
            let num_alloc = num_alloc_ptr.as_ref().unwrap_unchecked();
            num_alloc.fetch_add(1, Ordering::SeqCst);
        }

        self.alloc_layout(layout)
    }
}

This is completely unsound! It relies on Bump always being wrapped inside an Allocator, which can't be statically proven. But in practice we know that's always the case in Oxc - we never use Bump on its own. This unsoundness would be unacceptable in production code, but as it's behind a cargo feature which is only used in this internal tool, I think it'd be OK.

When I finally finish up replacing Bump with our own custom allocator (WIP) we could build allocation counting into it, and remove this hack.

Note: Counting the number and total byte size of arena reallocations separately would be a really good measure. Allocation in arena is very cheap, but reallocation is way more expensive (see comment below). Of the stats for arena allocations, that's one of the measures which should be a priority to reduce.

tasks/allocs/src/lib.rs

overlookmotel · 2025-08-01T09:37:12Z

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms.

Did you check that this is still the case once you added the warmup? If it still differs across platforms, I'd be interested to know by how much, and on what platforms. And is there also variance on the same platform across multiple runs?

I'd have broadly expected this measure to be deterministic. If it's not, the reason why might also shed some light on some of the variance we have in our benchmarks. It seems to relate to non-deterministic allocation patterns in HashMaps, but I've never understood where that's coming from.

tasks/allocs/src/lib.rs

camchenry · 2025-08-01T18:05:24Z

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms.

Did you check that this is still the case once you added the warmup? If it still differs across platforms, I'd be interested to know by how much, and on what platforms. And is there also variance on the same platform across multiple runs?

Yes, it would vary between platforms, even if I switched to mimalloc and also added the warmup. Not by much, and not on all files, but it still varied enough to cause CI to fail. The variance was on the order of ~0.05% of bytes allocated. For example, RadixUIAdoptionSection.jsx was allocating 724 bytes in the system allocator on my local machine, but parsing that file same file in CI allocated 756 bytes (an extra 32 bytes) in the system allocator. Interestingly, not all files exhibited variance, some were exactly the same in CI. I wonder if it's a particular language feature that is causing the issue.

I'd have broadly expected this measure to be deterministic. If it's not, the reason why might also shed some light on some of the variance we have in our benchmarks. It seems to relate to non-deterministic allocation patterns in HashMaps, but I've never understood where that's coming from.

I also expected it to be the same, given that it's the same input, and the same parsing code, but evidently there are other factors I'm not considering. The number of allocations appears to be affected by the platform it runs on, the order that files are parsed in, and the allocator being used as well.

camchenry · 2025-08-01T18:17:21Z

So the ideal would be to increment num_alloc in Alloc::alloc + Alloc::grow as well. But problem is that in those methods you only have access to the Bump, not the Allocator.

A terrible hack would be: [...]
This is completely unsound! It relies on Bump always being wrapped inside an Allocator, which can't be statically proven. But in practice we know that's always the case in Oxc - we never use Bump on its own. This unsoundness would be unacceptable in production code, but as it's behind a cargo feature which is only used in this internal tool, I think it'd be OK.

I think I can live with this, if you're okay with a little bit of extra feature-gated code in these methods. I think the effort will be worth it, as this will help us unlock a new avenue of performance optimization as well as preventing memory/performance regressions.

When I finally finish up replacing Bump with our own custom allocator (WIP) we could build allocation counting into it, and remove this hack.

This sounds like a great path forward to me.

overlookmotel · 2025-08-01T18:37:32Z

I think I can live with this, if you're okay with a little bit of extra feature-gated code in these methods. I think the effort will be worth it, as this will help us unlock a new avenue of performance optimization as well as preventing memory/performance regressions.

I can live with monstrous unsoundness, if you can! 😄 I agree it's a metric we can likely leverage to make improvements on perf.

overlookmotel · 2025-08-01T18:51:45Z

Oh, just one more thing. There's an annoyance with cargo features.

We run tests with --all-features, and we don't really want this code active in tests. If nothing else, it'll slow them down due to all the atomic ops in Allocator, and if we're adding the terrible unsound hack to Alloc trait, then we definitely don't want that code running in any circumstances other than oxc_allocs.

The crappy "solution" we've found for feature-gating certain code that's only used for oxlint JS plugins is to have 2 features called oxlint2 and disable_oxlint2. All the code which we want compiled for napi/oxlint2 is gated with:

#[cfg(all(feature = "oxlint2", not(feature = "disable_oxlint2")))]
fn gated() { /* ... */ }

So gated is not compiled with --all-features. You have to just enable oxlint2 feature by itself.

This is crap. But it's the best we've come up with so far. So unfortunately I suggest using the same hack here in oxc_allocator crate - track_allocations and disable_track_allocations features.

To double-protect against hack code being compiled in production builds, I'd recommend adding disable_track_allocations to default features in oxc_allocator and then disabling default features for oxc_allocator in oxc_allocs.

camchenry · 2025-08-01T19:41:59Z

This is crap. But it's the best we've come up with so far. So unfortunately I suggest using the same hack here in oxc_allocator crate - track_allocations and disable_track_allocations features.

To double-protect against hack code being compiled in production builds, I'd recommend adding disable_track_allocations to default features in oxc_allocator and then disabling default features for oxc_allocator in oxc_allocs.

Done. However, I wasn't able to add disable_track_allocations as a default feature, as you can't override the default features in oxc_allocs then apparently:

error: failed to load manifest for workspace member `oxc/tasks/allocs`
Caused by:
  error inheriting `oxc_allocator` from workspace root manifest's `workspace.dependencies.oxc_allocator`

Caused by:
  `default-features = false` cannot override workspace's `default-features`

crates/oxc_allocator/src/alloc.rs

tasks/allocs/Cargo.toml

tasks/allocs/allocs_parser.snap

tasks/allocs/src/lib.rs

Boshen · 2025-08-06T05:37:40Z

Merge activity

Aug 6, 5:37 AM UTC: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
Aug 6, 5:37 AM UTC: Boshen added this pull request to the Graphite merge queue.
Aug 6, 5:43 AM UTC: Merged by the Graphite merge queue.

- related: oxc-project/backlog#5 This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect. Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.

Follow-on after #12555. Avoid making `bump` field of `Allocator` public when `track_allocations` feature is enabled, by moving the field offset calculations to next to `Allocator`'s definition.

…12937) Follow-on after #12555. Pure refactor. Move code related to allocation tracking into its own module, and introduce an `AllocationStats` struct. This reduces noise in `allocator.rs` and `alloc.rs`, with less repetitions of `#[cfg(all(feature = "track_allocations", not(feature = "disable_track_allocations")))]` etc. Also bulk out the doc comments explaining the unsoundness, and how we must take care to avoid allocation tracking code being compiled in production code.

- related: oxc-project/backlog#5 This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect. Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.

…project#12936) Follow-on after oxc-project#12555. Avoid making `bump` field of `Allocator` public when `track_allocations` feature is enabled, by moving the field offset calculations to next to `Allocator`'s definition.

…xc-project#12937) Follow-on after oxc-project#12555. Pure refactor. Move code related to allocation tracking into its own module, and introduce an `AllocationStats` struct. This reduces noise in `allocator.rs` and `alloc.rs`, with less repetitions of `#[cfg(all(feature = "track_allocations", not(feature = "disable_track_allocations")))]` etc. Also bulk out the doc comments explaining the unsoundness, and how we must take care to avoid allocation tracking code being compiled in production code.

…ocationStats` (#13043) `AllocationStats` (introduced in #12555 and #12937) previously had to contain `AtomicUsize`s because `Allocator` was `Sync`. #13033 removed the `Sync` impl for `Allocator`, so now there's no need for synchronization in `AllocationStats`, and these fields can be `Cell<usize>` instead.

github-actions bot added A-parser Area - Parser C-test Category - Testing. Code is missing test cases, or a PR is adding them labels Jul 28, 2025

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 3d4ff45 to 9066ffa Compare July 28, 2025 04:42

overlookmotel reviewed Jul 28, 2025

View reviewed changes

crates/oxc_allocator/src/allocator.rs Outdated Show resolved Hide resolved

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch 12 times, most recently from 6b10fc7 to 04a3ca5 Compare August 1, 2025 03:28

camchenry requested a review from overlookmotel August 1, 2025 03:29

camchenry marked this pull request as ready for review August 1, 2025 03:29

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 04a3ca5 to 1a4d231 Compare August 1, 2025 03:45

camchenry mentioned this pull request Aug 1, 2025

refactor(parser): allocate module records in arena #12701

Closed

overlookmotel reviewed Aug 1, 2025

View reviewed changes

tasks/allocs/src/lib.rs Outdated Show resolved Hide resolved

tasks/allocs/src/lib.rs Show resolved Hide resolved

tasks/allocs/src/lib.rs Show resolved Hide resolved

overlookmotel reviewed Aug 1, 2025

View reviewed changes

tasks/allocs/src/lib.rs Show resolved Hide resolved

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 1a4d231 to e15fd34 Compare August 1, 2025 18:59

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from e3ab989 to 3e8c580 Compare August 1, 2025 19:49

graphite-app bot reviewed Aug 1, 2025

View reviewed changes

crates/oxc_allocator/src/alloc.rs Show resolved Hide resolved

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch 2 times, most recently from d505527 to 5a84c8b Compare August 1, 2025 20:08

camchenry requested a review from overlookmotel August 2, 2025 00:03

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 5a84c8b to a2cf46a Compare August 6, 2025 03:32

camchenry mentioned this pull request Aug 6, 2025

perf(parser): alloc enough space for covering object assignments #12837

Closed

Boshen reviewed Aug 6, 2025

View reviewed changes

tasks/allocs/Cargo.toml Outdated Show resolved Hide resolved

tasks/allocs/allocs_parser.snap Show resolved Hide resolved

tasks/allocs/src/lib.rs Show resolved Hide resolved

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from a2cf46a to 1a29424 Compare August 6, 2025 04:13

camchenry requested a review from Boshen August 6, 2025 04:16

camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 1a29424 to 21db624 Compare August 6, 2025 04:31

Boshen added the 0-merge Merge with Graphite Merge Queue label Aug 6, 2025

graphite-app bot force-pushed the 07-28-test_parser_track_number_of_allocations branch from 21db624 to d79f4ec Compare August 6, 2025 05:38

graphite-app bot merged commit d79f4ec into main Aug 6, 2025
32 checks passed

graphite-app bot deleted the 07-28-test_parser_track_number_of_allocations branch August 6, 2025 05:43

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Aug 6, 2025

This was referenced Aug 6, 2025

release(crates): v0.81.0 #12840

Closed

release(crates): v0.81.0 #12850

Merged

This was referenced Aug 9, 2025

refactor(allocator): do not make Allocator::bump field public #12936

Merged

refactor(allocator): move allocation tracking code into own module #12937

Merged

overlookmotel mentioned this pull request Aug 13, 2025

refactor(allocator): replace AtomicUsize with Cell<usize> in AllocationStats #13043

Merged

	pub fn with_capacity_in(cap: usize, alloc: &'a A) -> Self {
	unsafe {
	let elem_size = mem::size_of::<T>();

	let alloc_size = cap.checked_mul(elem_size).unwrap_or_else(\|\| capacity_overflow());
	alloc_guard(alloc_size).unwrap_or_else(\|_\| capacity_overflow());

	// handles ZSTs and `cap = 0` alike
	let ptr = if alloc_size == 0 {
	NonNull::<T>::dangling()
	} else {
	let align = mem::align_of::<T>();
	let layout = Layout::from_size_align(alloc_size, align).unwrap();
	alloc.alloc(layout).cast::<T>()
	};

	// `cap as u32` is safe because `alloc_guard` ensures that `cap`
	// cannot exceed `u32::MAX`.
	#[expect(clippy::cast_possible_truncation)]
	let cap = cap as u32;
	RawVec { ptr, alloc, cap, len: 0 }
	}
	}

	fn finish_grow(&self, new_layout: Layout) -> Result<NonNull<u8>, AllocError> {
	alloc_guard(new_layout.size())?;

	let new_ptr = match self.current_layout() {
	Some(layout) => unsafe {
	// Marking this function as `#[cold]` and `#[inline(never)]` because grow method is
	// relatively expensive and we want to avoid inlining it into the caller.
	#[cold]
	#[inline(never)]
	unsafe fn grow<T, A: Alloc>(
	alloc: &A,
	ptr: NonNull<T>,
	old_layout: Layout,
	new_layout: Layout,
	) -> NonNull<u8> {
	alloc.grow(ptr.cast(), old_layout, new_layout)
	}
	debug_assert!(new_layout.align() == layout.align());
	grow(self.alloc, self.ptr, layout, new_layout)
	},
	None => self.alloc.alloc(new_layout),
	};

	Ok(new_ptr)
	}

Uh oh!

test(parser): track number of allocations #12555

test(parser): track number of allocations #12555

Uh oh!

Conversation

camchenry commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camchenry commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #12555 will not alter performance

Summary

Footnotes

Uh oh!

Uh oh!

overlookmotel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

overlookmotel commented Aug 1, 2025

Uh oh!

Uh oh!

camchenry commented Aug 1, 2025

Uh oh!

camchenry commented Aug 1, 2025

Uh oh!

overlookmotel commented Aug 1, 2025

Uh oh!

overlookmotel commented Aug 1, 2025

Uh oh!

camchenry commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Boshen commented Aug 6, 2025 • edited by graphite-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

camchenry commented Jul 28, 2025 •

edited

Loading

camchenry commented Jul 28, 2025 •

edited

Loading

codspeed-hq bot commented Jul 28, 2025 •

edited

Loading

overlookmotel left a comment •

edited

Loading

Boshen commented Aug 6, 2025 •

edited by graphite-app bot

Loading