Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast atan and atan2 functions. #8388

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
69052fa
Fix for the removed DataLayout constructor.
mcourteaux Aug 13, 2024
e82d9ff
Fast vectorizable atan and atan2 functions.
mcourteaux Aug 10, 2024
11b442c
Default to not using fast atan versions if on CUDA.
mcourteaux Aug 10, 2024
dee28bc
Finished fast atan/atan2 functions and tests.
mcourteaux Aug 10, 2024
362f0ea
Correct attribution.
mcourteaux Aug 10, 2024
1bd7f7a
Clang-format
mcourteaux Aug 10, 2024
4f1e851
Weird WebAssembly limits...
mcourteaux Aug 11, 2024
f10396b
Small improvements to the optimization script.
mcourteaux Aug 11, 2024
de9d3b7
Polynomial optimization for log, exp, sin, cos with correct ranges.
mcourteaux Aug 11, 2024
d8e3225
Improve fast atan performance tests for GPU.
mcourteaux Aug 12, 2024
3bcd1a7
Bugfix fast_atan approximation. Fix correctness test to exceed the ra…
mcourteaux Aug 12, 2024
2aa0c7e
Cleanup
mcourteaux Aug 12, 2024
fd088f8
Enum class instead of enum for ApproximationPrecision.
mcourteaux Aug 12, 2024
62534d7
Weird Metal limits. There should be a better way...
mcourteaux Aug 12, 2024
c76e719
Skip test for WebGPU.
mcourteaux Aug 12, 2024
fc25944
Fast atan/atan2 polynomials reoptimized. New optimization strategy: ULP.
mcourteaux Aug 13, 2024
b5d0cad
Feedback Steven.
mcourteaux Aug 13, 2024
4d61c6a
More comments and test mantissa error.
mcourteaux Aug 14, 2024
ff28b99
Do not error when testing arctan performance on Metal / WebGPU.
mcourteaux Aug 14, 2024
5a435f0
Partially apply clang-tidy fixes we don't enforce yet (#8376)
abadams Aug 16, 2024
a4544be
Fix bundling error on buildbots (#8392)
alexreinking Aug 16, 2024
624f737
Fix incorrect std::array sizes in Target.cpp (#8396)
steven-johnson Aug 23, 2024
5ca88b7
Fix _Float16 detection on ARM64 GCC<13 (#8401)
alexreinking Aug 29, 2024
238f73c
Update README.md (#8404)
abadams Sep 2, 2024
b09f611
Support CMAKE_OSX_ARCHITECTURES (#8390)
alexreinking Sep 4, 2024
0614530
Pip packaging at last! (#8405)
alexreinking Sep 4, 2024
ae6dac4
Big documentation update (#8410)
alexreinking Sep 5, 2024
30b5938
Document how to find Halide from a pip installation (#8411)
alexreinking Sep 6, 2024
6f0da12
Merge pull request #8412
alexreinking Sep 6, 2024
44651f9
Fix classifier spelling (#8413)
alexreinking Sep 7, 2024
636ad8f
Make run-clang-tidy.sh work on macOS (#8416)
alexreinking Sep 9, 2024
51824df
Link to PyPI from Doxygen index.html (#8415)
alexreinking Sep 9, 2024
c9b2a76
Include our Markdown documentation in the Doxygen site. (#8417)
alexreinking Sep 10, 2024
a8966e9
Add missing backslash (#8419)
abadams Sep 15, 2024
9bcb9b7
Reschedule the matrix multiply performance app (#8418)
abadams Sep 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Correct attribution.
  • Loading branch information
mcourteaux committed Aug 13, 2024
commit 362f0ea9f8de7970b5ae7e46e0da2c7814f151e0
5 changes: 3 additions & 2 deletions src/IROperator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1421,8 +1421,8 @@ Expr fast_cos(const Expr &x_full) {
return fast_sin_cos(x_full, false);
}

// A vectorizable atan and atan2 implementation. Based on syrah fast vector math
// https://github.com/boulos/syrah/blob/master/src/include/syrah/FixedVectorMath.h#L255
// A vectorizable atan and atan2 implementation.
// Based on the ideas presented in https://mazzo.li/posts/vectorized-atan2.html.
Expr fast_atan_approximation(const Expr &x_full, ApproximationPrecision precision, bool between_m1_and_p1) {
const float pi_over_two = 1.57079632679489661923f;
Expr x;
Expand All @@ -1434,6 +1434,7 @@ Expr fast_atan_approximation(const Expr &x_full, ApproximationPrecision precisio
x = select(x_gt_1, 1.0f / x_full, x_full);
}

// Coefficients obtained using src/polynomial_optimizer.py
std::vector<float> c;
if (precision == MAE_1e_2 || precision == Poly2) {
// Coefficients with max error: 4.9977e-03
Expand Down