Skip to content

Optimize away compare256 for short matches #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 14, 2025

Conversation

brian-pane
Copy link

No description provided.

@brian-pane
Copy link
Author

I'm posting this for discussion because it produces mixed results on my test system:

  • Improvement in cycles and wall time at compression levels 2 and 9
  • Regression in instructions (but no change in cycles or wall time) at levels 3-8.

Whether it helps is probably dependent on the CPU and the input data.

Comment on lines +195 to +196
if best_len < core::mem::size_of::<u64>() {
let scan_val = u64::from_ne_bytes(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried if best_len < 32 { followed by the equivalent AVX2 implementation (similar to how compare32 processes each 32=byte chunk), but it was slower overall.

Copy link
Collaborator

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a slightly higher instruction count (correctly predicted branches?) is totally acceptable if we get a performance boost, especially because levels 3,4, 7 and 8 are rarely used in practice.

And the gains at level 2 and 9 are very real, so I think this is OK. Thanks!

@folkertdev folkertdev merged commit 4ca72cd into trifectatechfoundation:main Jan 14, 2025
20 checks passed
@brian-pane brian-pane deleted the longest-match64 branch April 1, 2025 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants