Skip to content

Conversation

@camchenry
Copy link
Member

@camchenry camchenry commented Jul 28, 2025

This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect.

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.

@github-actions github-actions bot added A-parser Area - Parser C-test Category - Testing. Code is missing test cases, or a PR is adding them labels Jul 28, 2025
Copy link
Member Author

camchenry commented Jul 28, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codspeed-hq
Copy link

codspeed-hq bot commented Jul 28, 2025

CodSpeed Instrumentation Performance Report

Merging #12555 will not alter performance

Comparing 07-28-test_parser_track_number_of_allocations (d79f4ec) with main (d93e373)1

Summary

✅ 34 untouched benchmarks

Footnotes

  1. No successful run was found on main (d79f4ec) during the generation of this report, so d93e373 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 3d4ff45 to 9066ffa Compare July 28, 2025 04:42
@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch 12 times, most recently from 6b10fc7 to 04a3ca5 Compare August 1, 2025 03:28
@camchenry camchenry requested a review from overlookmotel August 1, 2025 03:29
@camchenry camchenry marked this pull request as ready for review August 1, 2025 03:29
@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 04a3ca5 to 1a4d231 Compare August 1, 2025 03:45
Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

In addition to the comments below:

There are also many arena allocations that occur without going through the Allocator APIs. Notably, Vec performs allocations and reallocations directly via Bump, by going through the Alloc trait (alloc.alloc() and alloc.grow() calls).

pub fn with_capacity_in(cap: usize, alloc: &'a A) -> Self {
unsafe {
let elem_size = mem::size_of::<T>();
let alloc_size = cap.checked_mul(elem_size).unwrap_or_else(|| capacity_overflow());
alloc_guard(alloc_size).unwrap_or_else(|_| capacity_overflow());
// handles ZSTs and `cap = 0` alike
let ptr = if alloc_size == 0 {
NonNull::<T>::dangling()
} else {
let align = mem::align_of::<T>();
let layout = Layout::from_size_align(alloc_size, align).unwrap();
alloc.alloc(layout).cast::<T>()
};
// `cap as u32` is safe because `alloc_guard` ensures that `cap`
// cannot exceed `u32::MAX`.
#[expect(clippy::cast_possible_truncation)]
let cap = cap as u32;
RawVec { ptr, alloc, cap, len: 0 }
}
}

fn finish_grow(&self, new_layout: Layout) -> Result<NonNull<u8>, AllocError> {
alloc_guard(new_layout.size())?;
let new_ptr = match self.current_layout() {
Some(layout) => unsafe {
// Marking this function as `#[cold]` and `#[inline(never)]` because grow method is
// relatively expensive and we want to avoid inlining it into the caller.
#[cold]
#[inline(never)]
unsafe fn grow<T, A: Alloc>(
alloc: &A,
ptr: NonNull<T>,
old_layout: Layout,
new_layout: Layout,
) -> NonNull<u8> {
alloc.grow(ptr.cast(), old_layout, new_layout)
}
debug_assert!(new_layout.align() == layout.align());
grow(self.alloc, self.ptr, layout, new_layout)
},
None => self.alloc.alloc(new_layout),
};
Ok(new_ptr)
}

These allocations / reallocations aren't getting counted, and they're a significant proportion of arena allocations.

So the ideal would be to increment num_alloc in Alloc::alloc + Alloc::grow as well. But problem is that in those methods you only have access to the Bump, not the Allocator.

A terrible hack would be:

impl Alloc for Bump {
    #[inline(always)]
    fn alloc(&self, layout: Layout) -> NonNull<u8> {
        #[cfg(feature = "track_allocations")]
        unsafe {
            use crate::Allocator;
            use std::{
                mem::offset_of,
                ptr,
                sync::atomic::{AtomicUsize, Ordering},
            };
            #[expect(clippy::cast_possible_wrap)]
            const OFFSET: isize = (offset_of!(Allocator, num_alloc) as isize)
                - (offset_of!(Allocator, bump) as isize);
            let num_alloc_ptr = ptr::from_ref(self).byte_offset(OFFSET).cast::<AtomicUsize>();
            let num_alloc = num_alloc_ptr.as_ref().unwrap_unchecked();
            num_alloc.fetch_add(1, Ordering::SeqCst);
        }

        self.alloc_layout(layout)
    }
}

This is completely unsound! It relies on Bump always being wrapped inside an Allocator, which can't be statically proven. But in practice we know that's always the case in Oxc - we never use Bump on its own. This unsoundness would be unacceptable in production code, but as it's behind a cargo feature which is only used in this internal tool, I think it'd be OK.

When I finally finish up replacing Bump with our own custom allocator (WIP) we could build allocation counting into it, and remove this hack.


Note: Counting the number and total byte size of arena reallocations separately would be a really good measure. Allocation in arena is very cheap, but reallocation is way more expensive (see comment below). Of the stats for arena allocations, that's one of the measures which should be a priority to reduce.

@overlookmotel
Copy link
Member

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms.

Did you check that this is still the case once you added the warmup? If it still differs across platforms, I'd be interested to know by how much, and on what platforms. And is there also variance on the same platform across multiple runs?

I'd have broadly expected this measure to be deterministic. If it's not, the reason why might also shed some light on some of the variance we have in our benchmarks. It seems to relate to non-deterministic allocation patterns in HashMaps, but I've never understood where that's coming from.

@camchenry
Copy link
Member Author

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms.

Did you check that this is still the case once you added the warmup? If it still differs across platforms, I'd be interested to know by how much, and on what platforms. And is there also variance on the same platform across multiple runs?

Yes, it would vary between platforms, even if I switched to mimalloc and also added the warmup. Not by much, and not on all files, but it still varied enough to cause CI to fail. The variance was on the order of ~0.05% of bytes allocated. For example, RadixUIAdoptionSection.jsx was allocating 724 bytes in the system allocator on my local machine, but parsing that file same file in CI allocated 756 bytes (an extra 32 bytes) in the system allocator. Interestingly, not all files exhibited variance, some were exactly the same in CI. I wonder if it's a particular language feature that is causing the issue.

I'd have broadly expected this measure to be deterministic. If it's not, the reason why might also shed some light on some of the variance we have in our benchmarks. It seems to relate to non-deterministic allocation patterns in HashMaps, but I've never understood where that's coming from.

I also expected it to be the same, given that it's the same input, and the same parsing code, but evidently there are other factors I'm not considering. The number of allocations appears to be affected by the platform it runs on, the order that files are parsed in, and the allocator being used as well.

@camchenry
Copy link
Member Author

So the ideal would be to increment num_alloc in Alloc::alloc + Alloc::grow as well. But problem is that in those methods you only have access to the Bump, not the Allocator.

A terrible hack would be: [...]
This is completely unsound! It relies on Bump always being wrapped inside an Allocator, which can't be statically proven. But in practice we know that's always the case in Oxc - we never use Bump on its own. This unsoundness would be unacceptable in production code, but as it's behind a cargo feature which is only used in this internal tool, I think it'd be OK.

I think I can live with this, if you're okay with a little bit of extra feature-gated code in these methods. I think the effort will be worth it, as this will help us unlock a new avenue of performance optimization as well as preventing memory/performance regressions.

When I finally finish up replacing Bump with our own custom allocator (WIP) we could build allocation counting into it, and remove this hack.

This sounds like a great path forward to me.

@overlookmotel
Copy link
Member

I think I can live with this, if you're okay with a little bit of extra feature-gated code in these methods. I think the effort will be worth it, as this will help us unlock a new avenue of performance optimization as well as preventing memory/performance regressions.

I can live with monstrous unsoundness, if you can! 😄 I agree it's a metric we can likely leverage to make improvements on perf.

@overlookmotel
Copy link
Member

Oh, just one more thing. There's an annoyance with cargo features.

We run tests with --all-features, and we don't really want this code active in tests. If nothing else, it'll slow them down due to all the atomic ops in Allocator, and if we're adding the terrible unsound hack to Alloc trait, then we definitely don't want that code running in any circumstances other than oxc_allocs.

The crappy "solution" we've found for feature-gating certain code that's only used for oxlint JS plugins is to have 2 features called oxlint2 and disable_oxlint2. All the code which we want compiled for napi/oxlint2 is gated with:

#[cfg(all(feature = "oxlint2", not(feature = "disable_oxlint2")))]
fn gated() { /* ... */ }

So gated is not compiled with --all-features. You have to just enable oxlint2 feature by itself.

This is crap. But it's the best we've come up with so far. So unfortunately I suggest using the same hack here in oxc_allocator crate - track_allocations and disable_track_allocations features.

To double-protect against hack code being compiled in production builds, I'd recommend adding disable_track_allocations to default features in oxc_allocator and then disabling default features for oxc_allocator in oxc_allocs.

@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 1a4d231 to e15fd34 Compare August 1, 2025 18:59
@camchenry
Copy link
Member Author

This is crap. But it's the best we've come up with so far. So unfortunately I suggest using the same hack here in oxc_allocator crate - track_allocations and disable_track_allocations features.

To double-protect against hack code being compiled in production builds, I'd recommend adding disable_track_allocations to default features in oxc_allocator and then disabling default features for oxc_allocator in oxc_allocs.

Done. However, I wasn't able to add disable_track_allocations as a default feature, as you can't override the default features in oxc_allocs then apparently:

error: failed to load manifest for workspace member `oxc/tasks/allocs`
Caused by:
  error inheriting `oxc_allocator` from workspace root manifest's `workspace.dependencies.oxc_allocator`

Caused by:
  `default-features = false` cannot override workspace's `default-features`

@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from e3ab989 to 3e8c580 Compare August 1, 2025 19:49
@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch 2 times, most recently from d505527 to 5a84c8b Compare August 1, 2025 20:08
@camchenry camchenry requested a review from overlookmotel August 2, 2025 00:03
@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 5a84c8b to a2cf46a Compare August 6, 2025 03:32
@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from a2cf46a to 1a29424 Compare August 6, 2025 04:13
@camchenry camchenry requested a review from Boshen August 6, 2025 04:16
@camchenry camchenry force-pushed the 07-28-test_parser_track_number_of_allocations branch from 1a29424 to 21db624 Compare August 6, 2025 04:31
@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Aug 6, 2025
Copy link
Member

Boshen commented Aug 6, 2025

Merge activity

- related: oxc-project/backlog#5

This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect.

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.
@graphite-app graphite-app bot force-pushed the 07-28-test_parser_track_number_of_allocations branch from 21db624 to d79f4ec Compare August 6, 2025 05:38
@graphite-app graphite-app bot merged commit d79f4ec into main Aug 6, 2025
32 checks passed
@graphite-app graphite-app bot deleted the 07-28-test_parser_track_number_of_allocations branch August 6, 2025 05:43
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Aug 6, 2025
This was referenced Aug 6, 2025
graphite-app bot pushed a commit that referenced this pull request Aug 9, 2025
Follow-on after #12555.

Avoid making `bump` field of `Allocator` public when `track_allocations` feature is enabled, by moving the field offset calculations to next to `Allocator`'s definition.
graphite-app bot pushed a commit that referenced this pull request Aug 9, 2025
…12937)

Follow-on after #12555. Pure refactor.

Move code related to allocation tracking into its own module, and introduce an `AllocationStats` struct.

This reduces noise in `allocator.rs` and `alloc.rs`, with less repetitions of `#[cfg(all(feature = "track_allocations", not(feature = "disable_track_allocations")))]` etc.

Also bulk out the doc comments explaining the unsoundness, and how we must take care to avoid allocation tracking code being compiled in production code.
taearls pushed a commit to taearls/oxc that referenced this pull request Aug 12, 2025
- related: oxc-project/backlog#5

This PR adds a reliable method of tracking the number of allocations made while running the parser. This includes both arena allocations (via a new feature), as well as system allocations (by creating a new global allocator). Reductions in these numbers should correlate to real-world performance improvements that aren't quantified in CodSpeed currently. This will also help prevent regressions where we accidentally allocate lots more memory than we expect.

Originally I had included total system bytes allocated, but it was tricky to get this number to match exactly between platforms. I opted not try and make this perfect but instead use the total number of allocations which is a good proxy for bytes anyway.
taearls pushed a commit to taearls/oxc that referenced this pull request Aug 12, 2025
…project#12936)

Follow-on after oxc-project#12555.

Avoid making `bump` field of `Allocator` public when `track_allocations` feature is enabled, by moving the field offset calculations to next to `Allocator`'s definition.
taearls pushed a commit to taearls/oxc that referenced this pull request Aug 12, 2025
…xc-project#12937)

Follow-on after oxc-project#12555. Pure refactor.

Move code related to allocation tracking into its own module, and introduce an `AllocationStats` struct.

This reduces noise in `allocator.rs` and `alloc.rs`, with less repetitions of `#[cfg(all(feature = "track_allocations", not(feature = "disable_track_allocations")))]` etc.

Also bulk out the doc comments explaining the unsoundness, and how we must take care to avoid allocation tracking code being compiled in production code.
graphite-app bot pushed a commit that referenced this pull request Aug 19, 2025
…ocationStats` (#13043)

`AllocationStats` (introduced in #12555 and #12937) previously had to contain `AtomicUsize`s because `Allocator` was `Sync`. #13033 removed the `Sync` impl for `Allocator`, so now there's no need for synchronization in `AllocationStats`, and these fields can be `Cell<usize>` instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-test Category - Testing. Code is missing test cases, or a PR is adding them

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants