C intrinsics: Use function attributes on GCC and Clang #86

k0001 · 2020-05-16T14:47:55Z

By enabling intrinsics per function, we make it easy for bindings to
compile all off blake3_avx2.c, blake3_avx512.c and blake3_sse41.c
without having to give to the C compiler different flags for each file,
a task that is often very hard to accomplish depending on the language's
packaging tools.

Related documentation:

https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes

https://clang.llvm.org/docs/AttributeReference.html#id300

I am making use of these changes in the Haskell bindings k0001/hs-blake3@d828472

By enabling intrinsics per function, we make it easy for bindings to compile all off `blake3_avx2.c`, `blake3_avx512.c` and `blake3_sse41.c` without having to give to the C compiler different flags for each file, a task that is often very hard to accomplish depending on the language's packaging tools. Related documentation: https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes https://clang.llvm.org/docs/AttributeReference.html#id300

oconnor663 · 2020-05-16T18:55:56Z

Could you say more about where these attributes are supported? Are they documented anywhere? These C files are designed to work on "reasonably modern" versions of GCC, Clang, and MSVC, and hopefully we can preserve that.

I'm curious about TARGET_AVX512 in particular. What features does that enable? AVX-512 is divided into a bunch of different features, and no CPU supports all of them.

k0001 · 2020-05-16T19:26:51Z

Could you say more about where these attributes are supported? Are they documented anywhere? These C files are designed to work on "reasonably modern" versions of GCC, Clang, and MSVC, and hopefully we can preserve that.

I linked to the documentation in the pull-request description. It's supported by both GCC and Clang. I'm not sure yet since when they support this, I'll try to find an answer to that. Here is another project using this same technique back in 2015: https://github.com/haskell-crypto/cryptonite/pull/316/files

I couldn't find anything similar for MSVC, which is why in the README file I kept the recommendation to pass in the compiler flag when using MSVC.

I'm curious about TARGET_AVX512 in particular. What features does that enable? AVX-512 is divided into a bunch of different features, and no CPU supports all of them.

Note that all of TARGET_AVX2, TARGET_AVX512 and TARGET_SSE41 are new definitions that I added to blake3_impl.h. These names don't mean anything other than what we want them to mean. I'm defining them as follows:

// On GCC and Clang we can enable intrinsics per function, rather than
// requiring their respective -mavx2, -mavx512vl, etc. compiler flags.
#if defined(__GNUC__) || defined(__clang__)
#  define TARGET_AVX2 __attribute__((target("avx2")))
#  define TARGET_AVX512 __attribute__((target("avx512vl,avx512f")))
#  define TARGET_SSE41 __attribute__((target("sse4.1")))
#else
#  define TARGET_AVX2    // On MSVC, use the compiler flag /arch:AVX2
#  define TARGET_AVX512  // On MSVC, use the compiler flag /argc:AVX512
#  define TARGET_SSE41   // On MSVC, this is always enabled.
#endif

The BLAKE_NO_xxx flags still work as expected.

sneves · 2020-05-16T19:27:43Z

As far as I could tell, these are supported in all reasonable versions of gcc and clang.

Perhaps a bit less noisy version would be something like

#include <immintrin.h>

#if defined(__clang__)
#pragma clang attribute push (__attribute__((target("avx2"))), apply_to=function)
#elif defined(__GNUC__)
#pragma GCC target("avx2")
#endif

... code here ...

#if defined(__clang__)
#pragma clang attribute pop
#endif

But this loses support for Clang <= 4.x. More specifically, versions 3.8.x, 3.9.x. and 4.x. Not sure if this is an important case to care about.

oconnor663 · 2020-05-16T20:34:31Z

I linked to the documentation in the pull-request description.

I'm sorry, you sure did, my bad.

Note that all of TARGET_AVX2, TARGET_AVX512 and TARGET_SSE41 are new definitions that I added to blake3_impl.h.

I really should've read all this more carefully. I'll do that now.

oconnor663 · 2020-05-19T21:33:59Z

(Apologies for not being responsive this week. I'm ramping up at a new job. I will probably need to get to this PR over the weekend.)

k0001 · 2020-05-19T21:44:32Z

No worries. I'm already using this approach downstream in the Haskell bindings, so this is not blocking me in any way. It's likely to be useful in future bindings, though :) Congratulations on the new job!

…

On Wed, May 20, 2020, 00:34 oconnor663 ***@***.***> wrote: (Apologies for not being responsive this week. I'm ramping up at a new job. I will probably need to get to this PR over the weekend.) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#86 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAAZZRPNGXPPLFC5VYIF6DRSL3NLANCNFSM4NC6J6TQ> .

xnox · 2020-06-16T01:33:20Z

That's a good start, but not quite complete as to how these attributes should be used.

Normally one simply adds these attributes on multiple implementations of the same function name. I.e. it should just be called hash_many() with different target attributes.

This way all of them are compiled, and at runtime the best one is used without the manual dispatch file. Or if march is set to high enough value, the redundant function copies are simply omitted.

I don't think something like that exists for MSVC, so I guess manual dispatch can be kept around for that.

Rust FMV https://docs.rs/multiversion/0.5.1/multiversion/
GCC FMV https://lwn.net/Articles/691932/

Actually I miss-understood..... one can declare multiple functions with the same name, and different targets, but for C++ front-end only. In C front-end, it just sets compiler options.

One can do target_clones with just the one function name, but then one is relying on automatic compiler vectorization, instead of hand-crafted implementations. I.e. see

http://gcc.1065356.n8.nabble.com/Function-Multiversioning-in-GCC6-td1360915.html and https://github.com/clearlinux/make-fmv-patch

Which is a bit sad.

oconnor663 · 2020-06-28T21:34:12Z

Looking at this again, I'm mildly against it. A "good citizen" caller will still need to supply file-specific compiler flags to build the intrinsics implementations correctly under MSVC. Getting rid of that step on Unix might be convenient for callers who are certain they'll never need Windows support, but it introduces some inconsistency, and it makes it more likely that callers who only test on GCC end up doing the wrong thing.

Similarly, it sounds like one of the intended benefits of this change is that callers could build the intrinsics implementations for all targets (again assuming MSVC is not a target), side-stepping the issue of selecting assembly for 64-bit vs intrinsics for 32-bit. But this is also something we probably don't want to encourage. The assembly implementations perform better (and more consistently), and we want them to be used as widely as possible.

xnox · 2020-06-28T23:43:00Z

Looking at this again, I'm mildly against it. A "good citizen" caller will still need to supply file-specific compiler flags to build the intrinsics implementations correctly under MSVC. Getting rid of that step on Unix might be convenient for callers who are certain they'll never need Windows support, but it introduces some inconsistency, and it makes it more likely that callers who only test on GCC end up doing the wrong thing.

Similarly, it sounds like one of the intended benefits of this change is that callers could build the intrinsics implementations for all targets (again assuming MSVC is not a target), side-stepping the issue of selecting assembly for 64-bit vs intrinsics for 32-bit. But this is also something we probably don't want to encourage. The assembly implementations perform better (and more consistently), and we want them to be used as widely as possible.

The above does not change anything about picking assembly vs intrinsics, one still has to do that, as both implementations provide the same symbols. It would allow collapsing to just just have two make targets, one for intrinsics and one for assembly.

Also current makefile is very X86 specific, and attempts to compile X86 code on ARM, Power, Z....

MSVC does not use Makefiles, and the most natural thing to provide there would be to provide a visualstudio project files.

Given that there are a number of options to build these files for, maybe a Makefile is not enough. And a configuration tool should be added? I.e. like a meson config file, which will allow to configure, built and test this C code with various options.

Note that meson can generate visual studio & xcode projects too, thus a meson project files might be nice to provide as it will work on any platform correctly. Including using special configure flag for certain files.

oconnor663 · 2020-06-29T00:03:50Z

This is admittedly confusing, but the Makefile in here is really only for internal testing, not for public consumption. I should probably put a big comment at the top or something. It's mentioned briefly in the README here. The way I see it, we don't really provide any build system, and the docs are supposed to tell you how to compile the files together (using whatever system you're already using).

xnox · 2020-06-29T08:58:31Z

This is admittedly confusing, but the Makefile in here is really only for internal testing, not for public consumption. I should probably put a big comment at the top or something. It's mentioned briefly in the README here. The way I see it, we don't really provide any build system, and the docs are supposed to tell you how to compile the files together (using whatever system you're already using).

Ok. Even adding a suffix to the file might make that obvious. I.e. Makefile.example will cause people to have to type make -f Makefile.example and just a plain make will stop working.

Separately, if intrinsics implementation are not recommended, and one should use the assembly one.... why are intrinsics ones provided at all? and why is @k0001 using them, instead of assembly?

sneves · 2020-06-29T15:06:08Z

The intrinsics code will work for 32-bit x86, whereas the assembly is x86_64-specific. It's also conceivable that future architectures will not like the choices made on the assembly and the intrinsics code, with compilers tuned for those, will perform better. So it's somewhat useful to keep that code around.

k0001 · 2020-07-19T12:44:01Z

To remind you of the motivation for this change: As I mentioned in the pull-request description, the intention here is to make it easier for downstream bindings to build the C sources without having to specify different compiler flags for each of them. The packaging tools of some languages (e.g. Haskell's) don't support specifying different compiler flags per file. The changes I propose address this issue for the GCC and Clang compiler. I think the #pragma approach could work too (see #86 (comment)), but I haven't tried it.

why is @k0001 using them, instead of assembly?

The assembly implementations are preferred in the Haskell bindings, too. The C implementations are used only as a fallback if assembly implementations for a particular platform are not available.

oconnor663 · 2020-07-20T13:56:57Z

Ok. Even adding a suffix to the file might make that obvious. I.e. Makefile.example will cause people to have to type make -f Makefile.example and just a plain make will stop working.

This is a good idea. I've gone ahead and renamed c/Makefile to c/Makefile.testing.

What's happens when you try to build your Haskell library on Windows? Does the build fail? My biggest worry about this approach is still that it pushes callers in a direction that isn't going to work on Windows. I completely agree that setting different flags per file is a drag though.

k0001 · 2020-07-20T14:36:24Z

What's happens when you try to build your Haskell library on Windows? Does the build fail?

@oconnor663 you can see the logic here by yourself: https://github.com/k0001/hs-blake3/blob/3d7e209b2ebc11037058758ee62432e8a60f8a03/blake3/blake3.cabal#L53-L89

I have not tried building this on Windows, but my expectation is that things will build there with GCC or Clang. My understanding is that Cabal (the Haskell packaging tool) doesn't work with MSVC, so that's a non-issue in this particular scenario. In any case, on 64-bit Linux, Darwin and Windows platforms the assembly implementations are always preferred unless SIMD support is explicitly disabled. I haven't dealt with anything related to ARM yet.

erijo · 2020-10-21T07:02:11Z

In ccache there was a bug reported where compilation failed when using an old clang version on mac due to the compiler not supporting the target attribute ("warning: unknown attribute 'target' ignored") which is used by ccache in one place. At the same time BLAKE3's avx2 source compiled without errors using -mavx2.

k0001 mentioned this pull request May 16, 2020

Enable optimized versions k0001/hs-blake3#1

Closed

sneves mentioned this pull request Oct 20, 2020

fix disabled-optimization -Wall -Werror #128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C intrinsics: Use function attributes on GCC and Clang #86

C intrinsics: Use function attributes on GCC and Clang #86

k0001 commented May 16, 2020

oconnor663 commented May 16, 2020

k0001 commented May 16, 2020

sneves commented May 16, 2020

oconnor663 commented May 16, 2020 •

edited

Loading

oconnor663 commented May 19, 2020

k0001 commented May 19, 2020 via email

xnox commented Jun 16, 2020 •

edited

Loading

oconnor663 commented Jun 28, 2020

xnox commented Jun 28, 2020

oconnor663 commented Jun 29, 2020

xnox commented Jun 29, 2020

sneves commented Jun 29, 2020

k0001 commented Jul 19, 2020 •

edited

Loading

oconnor663 commented Jul 20, 2020

k0001 commented Jul 20, 2020

erijo commented Oct 21, 2020

C intrinsics: Use function attributes on GCC and Clang #86

Are you sure you want to change the base?

C intrinsics: Use function attributes on GCC and Clang #86

Conversation

k0001 commented May 16, 2020

oconnor663 commented May 16, 2020

k0001 commented May 16, 2020

sneves commented May 16, 2020

oconnor663 commented May 16, 2020 • edited Loading

oconnor663 commented May 19, 2020

k0001 commented May 19, 2020 via email

xnox commented Jun 16, 2020 • edited Loading

oconnor663 commented Jun 28, 2020

xnox commented Jun 28, 2020

oconnor663 commented Jun 29, 2020

xnox commented Jun 29, 2020

sneves commented Jun 29, 2020

k0001 commented Jul 19, 2020 • edited Loading

oconnor663 commented Jul 20, 2020

k0001 commented Jul 20, 2020

erijo commented Oct 21, 2020

oconnor663 commented May 16, 2020 •

edited

Loading

xnox commented Jun 16, 2020 •

edited

Loading

k0001 commented Jul 19, 2020 •

edited

Loading