-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C intrinsics: Use function attributes on GCC and Clang #86
base: master
Are you sure you want to change the base?
Conversation
By enabling intrinsics per function, we make it easy for bindings to compile all off `blake3_avx2.c`, `blake3_avx512.c` and `blake3_sse41.c` without having to give to the C compiler different flags for each file, a task that is often very hard to accomplish depending on the language's packaging tools. Related documentation: https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes https://clang.llvm.org/docs/AttributeReference.html#id300
Could you say more about where these attributes are supported? Are they documented anywhere? These C files are designed to work on "reasonably modern" versions of GCC, Clang, and MSVC, and hopefully we can preserve that. I'm curious about |
I linked to the documentation in the pull-request description. It's supported by both GCC and Clang. I'm not sure yet since when they support this, I'll try to find an answer to that. Here is another project using this same technique back in 2015: https://github.com/haskell-crypto/cryptonite/pull/316/files I couldn't find anything similar for MSVC, which is why in the
Note that all of // On GCC and Clang we can enable intrinsics per function, rather than
// requiring their respective -mavx2, -mavx512vl, etc. compiler flags.
#if defined(__GNUC__) || defined(__clang__)
# define TARGET_AVX2 __attribute__((target("avx2")))
# define TARGET_AVX512 __attribute__((target("avx512vl,avx512f")))
# define TARGET_SSE41 __attribute__((target("sse4.1")))
#else
# define TARGET_AVX2 // On MSVC, use the compiler flag /arch:AVX2
# define TARGET_AVX512 // On MSVC, use the compiler flag /argc:AVX512
# define TARGET_SSE41 // On MSVC, this is always enabled.
#endif The |
As far as I could tell, these are supported in all reasonable versions of gcc and clang. Perhaps a bit less noisy version would be something like
But this loses support for Clang <= 4.x. More specifically, versions 3.8.x, 3.9.x. and 4.x. Not sure if this is an important case to care about. |
I'm sorry, you sure did, my bad.
I really should've read all this more carefully. I'll do that now. |
(Apologies for not being responsive this week. I'm ramping up at a new job. I will probably need to get to this PR over the weekend.) |
No worries. I'm already using this approach downstream in the Haskell
bindings, so this is not blocking me in any way. It's likely to be useful
in future bindings, though :)
Congratulations on the new job!
…On Wed, May 20, 2020, 00:34 oconnor663 ***@***.***> wrote:
(Apologies for not being responsive this week. I'm ramping up at a new
job. I will probably need to get to this PR over the weekend.)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#86 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAZZRPNGXPPLFC5VYIF6DRSL3NLANCNFSM4NC6J6TQ>
.
|
Actually I miss-understood..... one can declare multiple functions with the same name, and different targets, but for C++ front-end only. In C front-end, it just sets compiler options. One can do target_clones with just the one function name, but then one is relying on automatic compiler vectorization, instead of hand-crafted implementations. I.e. see http://gcc.1065356.n8.nabble.com/Function-Multiversioning-in-GCC6-td1360915.html and https://github.com/clearlinux/make-fmv-patch Which is a bit sad. |
Looking at this again, I'm mildly against it. A "good citizen" caller will still need to supply file-specific compiler flags to build the intrinsics implementations correctly under MSVC. Getting rid of that step on Unix might be convenient for callers who are certain they'll never need Windows support, but it introduces some inconsistency, and it makes it more likely that callers who only test on GCC end up doing the wrong thing. Similarly, it sounds like one of the intended benefits of this change is that callers could build the intrinsics implementations for all targets (again assuming MSVC is not a target), side-stepping the issue of selecting assembly for 64-bit vs intrinsics for 32-bit. But this is also something we probably don't want to encourage. The assembly implementations perform better (and more consistently), and we want them to be used as widely as possible. |
The above does not change anything about picking assembly vs intrinsics, one still has to do that, as both implementations provide the same symbols. It would allow collapsing to just just have two make targets, one for intrinsics and one for assembly. Also current makefile is very X86 specific, and attempts to compile X86 code on ARM, Power, Z.... MSVC does not use Makefiles, and the most natural thing to provide there would be to provide a visualstudio project files. Given that there are a number of options to build these files for, maybe a Makefile is not enough. And a configuration tool should be added? I.e. like a meson config file, which will allow to configure, built and test this C code with various options. Note that meson can generate visual studio & xcode projects too, thus a meson project files might be nice to provide as it will work on any platform correctly. Including using special configure flag for certain files. |
This is admittedly confusing, but the Makefile in here is really only for internal testing, not for public consumption. I should probably put a big comment at the top or something. It's mentioned briefly in the README here. The way I see it, we don't really provide any build system, and the docs are supposed to tell you how to compile the files together (using whatever system you're already using). |
Ok. Even adding a suffix to the file might make that obvious. I.e. Separately, if intrinsics implementation are not recommended, and one should use the assembly one.... why are intrinsics ones provided at all? and why is @k0001 using them, instead of assembly? |
The intrinsics code will work for 32-bit x86, whereas the assembly is x86_64-specific. It's also conceivable that future architectures will not like the choices made on the assembly and the intrinsics code, with compilers tuned for those, will perform better. So it's somewhat useful to keep that code around. |
To remind you of the motivation for this change: As I mentioned in the pull-request description, the intention here is to make it easier for downstream bindings to build the C sources without having to specify different compiler flags for each of them. The packaging tools of some languages (e.g. Haskell's) don't support specifying different compiler flags per file. The changes I propose address this issue for the GCC and Clang compiler. I think the
The assembly implementations are preferred in the Haskell bindings, too. The C implementations are used only as a fallback if assembly implementations for a particular platform are not available. |
This is a good idea. I've gone ahead and renamed What's happens when you try to build your Haskell library on Windows? Does the build fail? My biggest worry about this approach is still that it pushes callers in a direction that isn't going to work on Windows. I completely agree that setting different flags per file is a drag though. |
@oconnor663 you can see the logic here by yourself: https://github.com/k0001/hs-blake3/blob/3d7e209b2ebc11037058758ee62432e8a60f8a03/blake3/blake3.cabal#L53-L89 I have not tried building this on Windows, but my expectation is that things will build there with GCC or Clang. My understanding is that Cabal (the Haskell packaging tool) doesn't work with MSVC, so that's a non-issue in this particular scenario. In any case, on 64-bit Linux, Darwin and Windows platforms the assembly implementations are always preferred unless SIMD support is explicitly disabled. I haven't dealt with anything related to ARM yet. |
In ccache there was a bug reported where compilation failed when using an old clang version on mac due to the compiler not supporting the target attribute ("warning: unknown attribute 'target' ignored") which is used by ccache in one place. At the same time BLAKE3's avx2 source compiled without errors using |
By enabling intrinsics per function, we make it easy for bindings to
compile all off
blake3_avx2.c
,blake3_avx512.c
andblake3_sse41.c
without having to give to the C compiler different flags for each file,
a task that is often very hard to accomplish depending on the language's
packaging tools.
Related documentation:
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes
https://clang.llvm.org/docs/AttributeReference.html#id300
I am making use of these changes in the Haskell bindings k0001/hs-blake3@d828472