Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibJS: Encode Cell Values as conventional bottom-tagged pointers #2633

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yyny
Copy link
Contributor

@yyny yyny commented Nov 29, 2024

Cell values were previously encoded as NaN-boxed pointers,
with the Cell type encoded in the top bits of the NaN-payload.

However, since Cell values are so common, this meant that we had to
do a lot of bit masking and bit shifting to retrieve the pointer
everytime we wanted to do a dereference.

Additionally, on all systems we support, (user-space) pointers must
always start with leading 0 bits.

The new encoding therefore takes advantage of this by storing pointers
without any NaN-boxing, and NaN-boxing the subnormal numbers
instead, which also all start with leading 0 bits.

Since subnormals are rarely seen in practice, it makes sense to do
the masking and shifting on these values instead, and encode the
more common Cell pointers in their usual encodings, thereby
increasing performance on average.

We can then further take advantage of the fact that Cell pointers are
8-byte aligned, and use the unused bottom bits to store the cell type.

This is called "bottom-tagging", which is better for modern CPUs as it
still allows them to prefetch the right cache line, since cache lines
are also >8-byte (usually 64-byte) aligned, and therefore have no reason
to inspect the bottom bits of a pointer in this case.

Bottom-tagging is also a more conventional approach, and as such
alternative garbage collector implementations are much more likely to
support it as compared to NaN-boxing.

Furthermore, since CellTag::Object == 0, Value::as_object() can
be optimized to a no-op in most cases, which makes the compiler and
the CPU very happy :^)

Note that since encoding non-userspace pointers is no longer supported,
LibJS will no longer work correctly in kernel-space and on some
embedded devices.

The capability to encode kernel-space pointers could be added back in
the future by adding a special case for them such that they will still
be NaN-boxed as before, at a performance cost of course.

@yyny yyny force-pushed the untagged-pointers branch from 6315a3c to d07dda9 Compare November 29, 2024 10:57
@yyny yyny force-pushed the untagged-pointers branch 3 times, most recently from e974086 to fb6a54f Compare November 29, 2024 15:02
@yyny yyny changed the title LibJS: Encode Cell Values as untagged pointers LibJS: Encode Cell Values as concentional bottom-tagged pointers Nov 29, 2024
@yyny yyny marked this pull request as ready for review November 29, 2024 15:03
`Cell` values were previously encoded as `NaN`-boxed pointers,
with the `Cell` type encoded in the top bits of the `NaN`-payload.

However, since `Cell` values are so common, this meant that we had to
do a lot of bit masking and bit shifting to retrieve the pointer
everytime we wanted to do a dereference.

Additionally, on all systems we support, (user-space) pointers must
always start with leading `0` bits.

The new encoding therefore takes advantage of this by storing pointers
without any `NaN`-boxing, and `NaN`-boxing the subnormal numbers
instead, which also all start with leading `0` bits.

Since subnormals are rarely seen in practice, it makes sense to do
the masking and shifting on these values instead, and encode the
more common `Cell` pointers in their usual encodings, thereby
increasing performance on average.

We can then further take advantage of the fact that `Cell` pointers are
8-byte aligned, and use the unused bottom bits to store the cell type.

This is called "bottom-tagging", which is better for modern CPUs as it
still allows them to prefetch the right cache line, since cache lines
are also >8-byte (usually 64-byte) aligned, and therefore have no reason
to inspect the bottom bits of a pointer in this case.

Bottom-tagging is also a more conventional approach, and as such
alternative garbage collector implementations are much more likely to
support it as compared to `NaN`-boxing.

Furthermore, since `CellTag::Object == 0`, `Value::as_object()` can
be optimized to a no-op in most cases, which makes the compiler and
the CPU very happy :^)

Note that since encoding non-userspace pointers is no longer supported,
`LibJS` will no longer work correctly in kernel-space and on some
embedded devices.

The capability to encode kernel-space pointers could be added back in
the future by adding a special case for them such that they will still
be `NaN`-boxed as before, at a performance cost of course.
@yyny yyny force-pushed the untagged-pointers branch from fb6a54f to 2653718 Compare November 29, 2024 16:22
@awesomekling awesomekling changed the title LibJS: Encode Cell Values as concentional bottom-tagged pointers LibJS: Encode Cell Values as conventional bottom-tagged pointers Nov 30, 2024
@yyny yyny marked this pull request as draft December 4, 2024 20:04
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions!

@github-actions github-actions bot added the stale label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants