-
Notifications
You must be signed in to change notification settings - Fork 10
Spin-weighted spherical harmonics and refactored use of SIMD #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The template for Horner's rule is: ft_horner(n, c, incc, m, x, f) For Clenshaw's algorithm, it's ft_clenshaw(n, c, incc, m, x, f) The coefficients have stride incc, but the points and the output array must be contiguous. The points and output pointers may be aliased so that in-place evaluation can work with one array for x and f. The design is as follows: 1. Find the native SIMD flags on the machine that is compiling the code. 2. Define as many internal functions using the highest level of SIMD with SIMD flags. Otherwise, define fallbacks. 3. Define the exported functions with SIMD dispatch based on GCC's cpuid. This approach allows one to cross-compile and generate optimal code for machines that are as good as the cross compiler. It also allows the code to be callable from machines that are newer than the cross compiler (in case the cross compiler is not state-of-the-art) due to the fallbacks, though newer SIMD is not accessible.
The reported error is: /usr/bin/ld: /tmp/ccuNTFaV.o: relocation R_X86_64_PC32 against undefined symbol `memset@@GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC 500/usr/bin/ld: final link failed: Bad value
apparently, linux build machines already have it
…cpuid_count(7, ...)
not just degree 4
The dft_c2r obviates the need for a global transpose (in colswap) outside the fftw call. Instead, it requires setting and extracting some data from fftw_complex arrays.
up error tolerance for triangular transforms for Windows
no warning in gcc-8 is a warning in gcc-7 and vice versa
Add complex conjugation to docs remove deploy binaries
…ect that the degrees are all N-1 \dh unrecognized => tab complete
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New features in this PR:
Improvements in this PR:
ft_simd
struct to store a bit-field of a variety of SIMD extensions.fftw_execute_dft_r2c
andfftw_execute_dft_r2c
instead ofFFTW_R2HC
andFFTW_HC2R
-type real-to-real transforms to avoid a global transpose of the data.New Examples in this PR:
spinweighted.c
is a basic tutorial on how to use spin-weighted spherical harmonic transforms.Releases no longer trigger the attachment of binaries, as compilation with
-march=native
may fail on a host computer.