-
Notifications
You must be signed in to change notification settings - Fork 14
Reduce size of Bytes
#20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Sorry oversight on my part there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I quite like this optimization. I've left a few little nitpicky comments, but otherwise I think looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just two minor nitpicks. otherwise, lgtm!
rust-analyzer trolling me |
I was randomly looking at the source of this package for inspiration on an unrelated project which required a compressed trie datastructure and I soon noticed that the
Bytes
struct, whose size constitute a bottleneck for the maximum size of words which can be hyphenated without allocation, had a significant amount of extraneous storage.The
array::IntoIter
type stores twousize
indices to be able to implementDoubleSidedIterator
, and we stored an extrausize
just for the length of the iterator, which can already be statically computed from the iterator itself.Switching to storing the array inline and reimplementing the iterator methods lowered this overhead by 16 bytes (3
usize
s to just one). This doesn't even make the code look much lower levelWe may further reduce memory usage by realising our indices have low bounds, so we can just use a single
u8
for the index.By changing that into a
NonZeroU8
we also remove memory overhead whenalloc
is enabled, thanks to niche optimisation.Previously, the max word size has been documented as 40 bytes in some places and 41 in others, whilst in practice it was only 38, as two bytes are reserved for matching the start and end of words.
With this change, the (true) max has been increased to 45 bytes, as alignment restrictions mean any length between 38 and 45 (inclusive) will result in the same size for the
Bytes
struct.The previous size of
Bytes
was 72, bringing the size ofSyllables
to 96. The new sizes are 48 and 72, respectively.If we were to push
Bytes
up to the limit of its previous size of 72, we could store as many as 69 bytes of word without allocating.Additionally, I opted to expose a new public constant
MAX_WORD_LEN
to avoid a future desync between the actual max length and the documented one, especially in downstream users.I also added tests to ensure that constant is correct. The old code passes those tests for a
MAX_WORD_LEN
value of 38.I haven't found a performance regression on my machine using
cargo bench -p hypher-bench
. I'm not sure if that's how that extra crate is meant to be used, usually I just see benchmarks specified in thebench
section ofCargo.toml
.