Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C intrinsics: Use function attributes on GCC and Clang #86

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

k0001
Copy link
Contributor

@k0001 k0001 commented May 16, 2020

By enabling intrinsics per function, we make it easy for bindings to
compile all off blake3_avx2.c, blake3_avx512.c and blake3_sse41.c
without having to give to the C compiler different flags for each file,
a task that is often very hard to accomplish depending on the language's
packaging tools.

Related documentation:

https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes

https://clang.llvm.org/docs/AttributeReference.html#id300

I am making use of these changes in the Haskell bindings k0001/hs-blake3@d828472

By enabling intrinsics per function, we make it easy for bindings to
compile all off `blake3_avx2.c`, `blake3_avx512.c` and `blake3_sse41.c`
without having to give to the C compiler different flags for each file,
a task that is often very hard to accomplish depending on the language's
packaging tools.

Related documentation:

  https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes

  https://clang.llvm.org/docs/AttributeReference.html#id300
@oconnor663
Copy link
Member

Could you say more about where these attributes are supported? Are they documented anywhere? These C files are designed to work on "reasonably modern" versions of GCC, Clang, and MSVC, and hopefully we can preserve that.

I'm curious about TARGET_AVX512 in particular. What features does that enable? AVX-512 is divided into a bunch of different features, and no CPU supports all of them.

@k0001
Copy link
Contributor Author

k0001 commented May 16, 2020

Could you say more about where these attributes are supported? Are they documented anywhere? These C files are designed to work on "reasonably modern" versions of GCC, Clang, and MSVC, and hopefully we can preserve that.

I linked to the documentation in the pull-request description. It's supported by both GCC and Clang. I'm not sure yet since when they support this, I'll try to find an answer to that. Here is another project using this same technique back in 2015: https://github.com/haskell-crypto/cryptonite/pull/316/files

I couldn't find anything similar for MSVC, which is why in the README file I kept the recommendation to pass in the compiler flag when using MSVC.

I'm curious about TARGET_AVX512 in particular. What features does that enable? AVX-512 is divided into a bunch of different features, and no CPU supports all of them.

Note that all of TARGET_AVX2, TARGET_AVX512 and TARGET_SSE41 are new definitions that I added to blake3_impl.h. These names don't mean anything other than what we want them to mean. I'm defining them as follows:

// On GCC and Clang we can enable intrinsics per function, rather than
// requiring their respective -mavx2, -mavx512vl, etc. compiler flags.
#if defined(__GNUC__) || defined(__clang__)
#  define TARGET_AVX2 __attribute__((target("avx2")))
#  define TARGET_AVX512 __attribute__((target("avx512vl,avx512f")))
#  define TARGET_SSE41 __attribute__((target("sse4.1")))
#else
#  define TARGET_AVX2    // On MSVC, use the compiler flag /arch:AVX2
#  define TARGET_AVX512  // On MSVC, use the compiler flag /argc:AVX512
#  define TARGET_SSE41   // On MSVC, this is always enabled.
#endif

The BLAKE_NO_xxx flags still work as expected.

@sneves
Copy link
Collaborator

sneves commented May 16, 2020

As far as I could tell, these are supported in all reasonable versions of gcc and clang.

Perhaps a bit less noisy version would be something like

#include <immintrin.h>

#if defined(__clang__)
#pragma clang attribute push (__attribute__((target("avx2"))), apply_to=function)
#elif defined(__GNUC__)
#pragma GCC target("avx2")
#endif

... code here ...

#if defined(__clang__)
#pragma clang attribute pop
#endif

But this loses support for Clang <= 4.x. More specifically, versions 3.8.x, 3.9.x. and 4.x. Not sure if this is an important case to care about.

@oconnor663
Copy link
Member

oconnor663 commented May 16, 2020

I linked to the documentation in the pull-request description.

I'm sorry, you sure did, my bad.

Note that all of TARGET_AVX2, TARGET_AVX512 and TARGET_SSE41 are new definitions that I added to blake3_impl.h.

I really should've read all this more carefully. I'll do that now.

@oconnor663
Copy link
Member

(Apologies for not being responsive this week. I'm ramping up at a new job. I will probably need to get to this PR over the weekend.)

@k0001
Copy link
Contributor Author

k0001 commented May 19, 2020 via email

@xnox
Copy link

xnox commented Jun 16, 2020

That's a good start, but not quite complete as to how these attributes should be used.

Normally one simply adds these attributes on multiple implementations of the same function name. I.e. it should just be called hash_many() with different target attributes.

This way all of them are compiled, and at runtime the best one is used without the manual dispatch file. Or if march is set to high enough value, the redundant function copies are simply omitted.

I don't think something like that exists for MSVC, so I guess manual dispatch can be kept around for that.

Rust FMV https://docs.rs/multiversion/0.5.1/multiversion/
GCC FMV https://lwn.net/Articles/691932/

Actually I miss-understood..... one can declare multiple functions with the same name, and different targets, but for C++ front-end only. In C front-end, it just sets compiler options.

One can do target_clones with just the one function name, but then one is relying on automatic compiler vectorization, instead of hand-crafted implementations. I.e. see

http://gcc.1065356.n8.nabble.com/Function-Multiversioning-in-GCC6-td1360915.html and https://github.com/clearlinux/make-fmv-patch

Which is a bit sad.

@oconnor663
Copy link
Member

Looking at this again, I'm mildly against it. A "good citizen" caller will still need to supply file-specific compiler flags to build the intrinsics implementations correctly under MSVC. Getting rid of that step on Unix might be convenient for callers who are certain they'll never need Windows support, but it introduces some inconsistency, and it makes it more likely that callers who only test on GCC end up doing the wrong thing.

Similarly, it sounds like one of the intended benefits of this change is that callers could build the intrinsics implementations for all targets (again assuming MSVC is not a target), side-stepping the issue of selecting assembly for 64-bit vs intrinsics for 32-bit. But this is also something we probably don't want to encourage. The assembly implementations perform better (and more consistently), and we want them to be used as widely as possible.

@xnox
Copy link

xnox commented Jun 28, 2020

Looking at this again, I'm mildly against it. A "good citizen" caller will still need to supply file-specific compiler flags to build the intrinsics implementations correctly under MSVC. Getting rid of that step on Unix might be convenient for callers who are certain they'll never need Windows support, but it introduces some inconsistency, and it makes it more likely that callers who only test on GCC end up doing the wrong thing.

Similarly, it sounds like one of the intended benefits of this change is that callers could build the intrinsics implementations for all targets (again assuming MSVC is not a target), side-stepping the issue of selecting assembly for 64-bit vs intrinsics for 32-bit. But this is also something we probably don't want to encourage. The assembly implementations perform better (and more consistently), and we want them to be used as widely as possible.

The above does not change anything about picking assembly vs intrinsics, one still has to do that, as both implementations provide the same symbols. It would allow collapsing to just just have two make targets, one for intrinsics and one for assembly.

Also current makefile is very X86 specific, and attempts to compile X86 code on ARM, Power, Z....

MSVC does not use Makefiles, and the most natural thing to provide there would be to provide a visualstudio project files.

Given that there are a number of options to build these files for, maybe a Makefile is not enough. And a configuration tool should be added? I.e. like a meson config file, which will allow to configure, built and test this C code with various options.

Note that meson can generate visual studio & xcode projects too, thus a meson project files might be nice to provide as it will work on any platform correctly. Including using special configure flag for certain files.

@oconnor663
Copy link
Member

This is admittedly confusing, but the Makefile in here is really only for internal testing, not for public consumption. I should probably put a big comment at the top or something. It's mentioned briefly in the README here. The way I see it, we don't really provide any build system, and the docs are supposed to tell you how to compile the files together (using whatever system you're already using).

@xnox
Copy link

xnox commented Jun 29, 2020

This is admittedly confusing, but the Makefile in here is really only for internal testing, not for public consumption. I should probably put a big comment at the top or something. It's mentioned briefly in the README here. The way I see it, we don't really provide any build system, and the docs are supposed to tell you how to compile the files together (using whatever system you're already using).

Ok. Even adding a suffix to the file might make that obvious. I.e. Makefile.example will cause people to have to type make -f Makefile.example and just a plain make will stop working.

Separately, if intrinsics implementation are not recommended, and one should use the assembly one.... why are intrinsics ones provided at all? and why is @k0001 using them, instead of assembly?

@sneves
Copy link
Collaborator

sneves commented Jun 29, 2020

The intrinsics code will work for 32-bit x86, whereas the assembly is x86_64-specific. It's also conceivable that future architectures will not like the choices made on the assembly and the intrinsics code, with compilers tuned for those, will perform better. So it's somewhat useful to keep that code around.

@k0001
Copy link
Contributor Author

k0001 commented Jul 19, 2020

To remind you of the motivation for this change: As I mentioned in the pull-request description, the intention here is to make it easier for downstream bindings to build the C sources without having to specify different compiler flags for each of them. The packaging tools of some languages (e.g. Haskell's) don't support specifying different compiler flags per file. The changes I propose address this issue for the GCC and Clang compiler. I think the #pragma approach could work too (see #86 (comment)), but I haven't tried it.

why is @k0001 using them, instead of assembly?

The assembly implementations are preferred in the Haskell bindings, too. The C implementations are used only as a fallback if assembly implementations for a particular platform are not available.

@oconnor663
Copy link
Member

Ok. Even adding a suffix to the file might make that obvious. I.e. Makefile.example will cause people to have to type make -f Makefile.example and just a plain make will stop working.

This is a good idea. I've gone ahead and renamed c/Makefile to c/Makefile.testing.


What's happens when you try to build your Haskell library on Windows? Does the build fail? My biggest worry about this approach is still that it pushes callers in a direction that isn't going to work on Windows. I completely agree that setting different flags per file is a drag though.

@k0001
Copy link
Contributor Author

k0001 commented Jul 20, 2020

What's happens when you try to build your Haskell library on Windows? Does the build fail?

@oconnor663 you can see the logic here by yourself: https://github.com/k0001/hs-blake3/blob/3d7e209b2ebc11037058758ee62432e8a60f8a03/blake3/blake3.cabal#L53-L89

I have not tried building this on Windows, but my expectation is that things will build there with GCC or Clang. My understanding is that Cabal (the Haskell packaging tool) doesn't work with MSVC, so that's a non-issue in this particular scenario. In any case, on 64-bit Linux, Darwin and Windows platforms the assembly implementations are always preferred unless SIMD support is explicitly disabled. I haven't dealt with anything related to ARM yet.

@erijo
Copy link
Contributor

erijo commented Oct 21, 2020

In ccache there was a bug reported where compilation failed when using an old clang version on mac due to the compiler not supporting the target attribute ("warning: unknown attribute 'target' ignored") which is used by ccache in one place. At the same time BLAKE3's avx2 source compiled without errors using -mavx2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants