Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make random faster by putting the innermost var last #6504

Merged
merged 4 commits into from
Jan 4, 2022

Conversation

abadams
Copy link
Member

@abadams abadams commented Dec 18, 2021

40% faster, and I fixed the issue where the low bits of our noise have a low period. This combined with #6506 makes it possible to generate low-bit-width-noise very cheaply for things like dithering.

@abadams abadams changed the title Make random 2x faster by putting the innermost var last Make random faster by putting the innermost var last Dec 19, 2021
By pulling constant additions outside of quadratics, we can shave off a
few add instructions in the inner loop for random number generation,
which uses a quadratic modulo 2^32

I also removed the !overflows predicates, because rules already fail to
match if a fold overflows.

New rules formally verified.
@abadams abadams requested a review from rootjalex January 3, 2022 18:16
@abadams
Copy link
Member Author

abadams commented Jan 4, 2022

The new simplifier rules have been formally verified

@steven-johnson steven-johnson self-requested a review January 4, 2022 16:31
@abadams abadams merged commit 0021165 into master Jan 4, 2022
@steven-johnson
Copy link
Contributor

FWIW, the new simplification rules now cause errors of the form Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide in code that has never failed in this way before -- I'd like to revert this change pending further investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants