Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skx, knl to x86_64 configuration family #183

Closed
fgvanzee opened this issue Mar 27, 2018 · 22 comments
Closed

Add skx, knl to x86_64 configuration family #183

fgvanzee opened this issue Mar 27, 2018 · 22 comments

Comments

@fgvanzee
Copy link
Member

As @loveshack suggested, we should add skx and knl to the x86_64 family. The only problem is, what happens if the compiler doesn't know about -march=knl or -march=skylake-avx512? For example, Ubunutu 16.04 (presumably a popular distribution of Linux) is on gcc 5.4.0, which knows about knl but not skylake-avx512.

@devinamatthews
Copy link
Member

I pointed to my solution in TBLIS here.

@fgvanzee
Copy link
Member Author

@devinamatthews Thanks Devin, I had forgotten about that.

@loveshack
Copy link
Contributor

loveshack commented Mar 27, 2018 via email

@fgvanzee
Copy link
Member Author

You're right, Dave; this is already a problem. Thankfully, Devin has already solved it. I'm working to insert similar logic into BLIS's build system. In short, if a sub-configuration cannot be compiled by the current compiler, it will be stripped from the configuration registry as the file is parsed.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 2, 2018

@loveshack FYI, I'm still working on this. I'm almost done, just ironing out a small kink related to configurations that must be disabled due to lack of compiler support.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 4, 2018

@loveshack This feature has been implemented in 786d15c (though I merged with a previous commit during the push). Just grab the head commit of the dev branch to try it out. Thanks for your patience.

@fgvanzee fgvanzee closed this as completed Apr 4, 2018
@loveshack
Copy link
Contributor

loveshack commented Apr 5, 2018 via email

@loveshack
Copy link
Contributor

loveshack commented Apr 5, 2018 via email

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 5, 2018

@loveshack Thanks Dave. I'll look into it. Just to be clear, is --enable-debug / --enable-debug opt the only option you are passing to configure (aside from the x86_64 configuration target)?

@loveshack
Copy link
Contributor

loveshack commented Apr 5, 2018 via email

@jeffhammond
Copy link
Member

VMOVDQA64 is AVX512F (Xeon and Xeon Phi) with zmm registers but AVX512VL (Xeon only) with xmm/ymm. If Devin didn't write this in assembly, I assume it is a toolchain bug. I seem to recall seeing or hearing about such issues in the past. I'm sure somebody else will sort it out soon enough.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 5, 2018

I can reproduce this bug. Investigating.

EDIT: I think this is actually on our end. Let me contemplate a fix.

@devinamatthews
Copy link
Member

If config/skx/bli_cntx_init_skx.c gets built with the SKX flags, then that would explain it. Any code that may get run before the corresponding configuration is definitively selected should use the generic flags.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 5, 2018

@devinamatthews Yes. I'm thinking through the appropriate Makefile patch.

@jeffhammond
Copy link
Member

Note also that you should probably use AVX2 compiler flags for SKX everywhere. Using 512b SIMD only pays off if the SIMD efficiency is very good and doesn't make sense for code that isn't compute-bound (e.g. BLAS1). The code in BLIS that clearly benefits from 512b SIMD is written in assembly and thus is unaffected by compiler flags.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 5, 2018

@jeffhammond Thanks for this suggestion. So you're saying that we can compile reference kernel code with AVX2 on skx, no questions asked, and the outcome will always be better than with AVX512?

@jeffhammond
Copy link
Member

@fgvanzee Yeah, reference/unoptimized code should be compiled to AVX2, because that code is not going to achieve high enough SIMD utilization to overcome the AVX512 frequency drop. Note that this doesn't mean AVX-512 isn't useful - it is certainly useful in BLAS3*, but I assume you are not using reference code there.

  • This isn't necessarily true in the parts with only one VPU, but those parts are less common and unlikely to be used in HPC systems. OpenBLAS Github issues have some info on this. I can go into technical details if you care, but the short story is that Xeon Scalable 3000-5000 series (except 5122) should be treated like Broadwell, but I think @devinamatthews already implemented that detection.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 6, 2018

@jeffhammond You're making me think that we need to begin tracking a new sub-set of CFLAGS, namely, those for reference kernels. So far, we only track CFLAGS for kernels. I'll open an issue.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 6, 2018

@loveshack Try d39fa1c. It hopefully fixes the issue with KNL executing the illegal AVX512VL instruction, but haven't had time to try it. I'll continue working on this tomorrow if needed.

@fgvanzee fgvanzee reopened this Apr 6, 2018
@loveshack
Copy link
Contributor

loveshack commented Apr 6, 2018 via email

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 6, 2018

@loveshack Thanks Dave. The segfault is simply due to the fact that we currently fail to account for non-32-bit integers in the BLAS test drivers. I'll see if I can come up with a fix.

@fgvanzee
Copy link
Member Author

fgvanzee commented Apr 6, 2018

@loveshack I think I've fixed this now in b549b91. The BLAS test drivers now define its integer type based on how BLIS was configured. I tested this on our KNL system via

./configure --enable-cblas --enable-debug -b64 x86_64

and it seems to work.

@fgvanzee fgvanzee closed this as completed Apr 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants