-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skx, knl to x86_64 configuration family #183
Comments
I pointed to my solution in TBLIS here. |
@devinamatthews Thanks Devin, I had forgotten about that. |
You wrote:
As @loveshack suggested, we should add `skx` and `knl` to the `x86_64`
family. The only problem is, what happens if the compiler doesn't know
about `-march=knl` or `-march=skylake-avx512`? For example, Ubunutu
16.04 (presumably a popular distribution of Linux) is on gcc 5.4.0,
which knows about `knl` but not `skylake-avx512`.
Remember you already have that problem, at least because of Zen, or even
avx2 on RHEL6. Good if you can test for the support of course, but it's
also worth suggesting devtoolset-6 on RHEL for complete x86_64 support.
|
You're right, Dave; this is already a problem. Thankfully, Devin has already solved it. I'm working to insert similar logic into BLIS's build system. In short, if a sub-configuration cannot be compiled by the current compiler, it will be stripped from the configuration registry as the file is parsed. |
@loveshack FYI, I'm still working on this. I'm almost done, just ironing out a small kink related to configurations that must be disabled due to lack of compiler support. |
@loveshack This feature has been implemented in 786d15c (though I merged with a previous commit during the push). Just grab the head commit of the |
You wrote:
@loveshack This feature has been implemented in 786d15c (though I
merged with a previous commit during the push). Just grab the head
commit of the `dev` branch to try it out. Thanks for your patience.
Thanks. However, when I configure for x86_64 and run make check on KNL
it fails with an illegal instruction error. I'll try to debug it later.
|
Annoyingly, and oddly, the crash I found goes away with --enable-debug
but happens with --enable-debug opt. As I assumed, it's trying to
execute an SKX instruction that KNL doesn't have. The backtrace from
test_libblis.x is:
```
(gdb) bt
#0 bli_cntx_set_method (cntx=0x141c9d0, method=<optimized out>) at ./include/x86_64/blis.h:17876
#1 bli_cntx_init_skx_ref (cntx=cntx@entry=0x141c9d0) at ref_kernels/bli_cntx_ref.c:464
#2 0x00002b86f10ac2e9 in bli_cntx_init_skx (cntx=0x141c9d0) at config/skx/bli_cntx_init_skx.c:42
#3 0x00002b86f17d67e9 in bli_gks_register_cntx (id=id@entry=BLIS_ARCH_SKX, nat_fp=0x2b86f10ac2d0 <bli_cntx_init_skx>, ref_fp=<optimized out>, ind_fp=<optimized out>) at frame/base/bli_gks.c:337
#4 0x00002b86f17d6885 in bli_gks_init () at frame/base/bli_gks.c:72
#5 0x00002b86f17d6ece in bli_init_apis () at frame/base/bli_init.c:79
#6 0x00002b86f2248e20 in pthread_once () from /lib64/libpthread.so.0
#7 0x00002b86f17d6f43 in bli_init_once () at frame/base/bli_init.c:105
#8 0x00002b86f18193d4 in bli_l3_ind_oper_find_avail (oper=BLIS_GEMM, dt=BLIS_DCOMPLEX) at frame/ind/bli_l3_ind.c:134
#9 0x00002b86f18190b5 in bli_ind_oper_find_avail (oper=oper@entry=BLIS_GEMM, dt=dt@entry=BLIS_DCOMPLEX) at frame/ind/bli_ind.c:200
#10 0x00002b86f1819109 in bli_ind_oper_get_avail_impl_string (oper=oper@entry=BLIS_GEMM, dt=dt@entry=BLIS_DCOMPLEX) at frame/ind/bli_ind.c:214
#11 0x00002b86f17d6e29 in bli_info_get_gemm_impl_string (dt=dt@entry=BLIS_DCOMPLEX) at frame/base/bli_info.c:120
#12 0x0000000000410a91 in libblis_test_output_params_struct (os=0x2b86f2a1d400 <_IO_2_1_stdout_>, params=params@entry=0x7fffdf3a5a40) at testsuite/src/test_libblis.c:670
#13 0x000000000041373e in libblis_test_read_params_file (input_filename=input_filename@entry=0x6266e0 <libblis_test_parameters_filename> "./testsuite/input.general.fast", params=params@entry=0x7fffdf3a5a40) at testsuite/src/test_libblis.c:461
#14 0x0000000000404f04 in main (argc=5, argv=<optimized out>) at testsuite/src/test_libblis.c:70
```
and the relevant line from disassembly is
```
=> 0x00002b86f10e0829 <+2665>: vmovdqa64 0x75cc9d(%rip),%xmm0 # 0x2b86f183d4d0
```
This is with Red Hat's GCC 6.3 on RHEL7, in case that matters. Is that
enough of a clue? Otherwise I can poke some more or supply a core dump.
It's probably not worth me trying to debug it directly.
|
@loveshack Thanks Dave. I'll look into it. Just to be clear, is |
You wrote:
@loveshack Thanks Dave. I'll look into it. Just to be clear, is
`--enable-debug` / `--enable-debug opt` the only option you are
passing to `configure` (aside from the `x86_64` configuration target)?
I was using --enable-cblas --enable-static in that case, but it fails
with plain ./configure too.
|
VMOVDQA64 is AVX512F (Xeon and Xeon Phi) with zmm registers but AVX512VL (Xeon only) with xmm/ymm. If Devin didn't write this in assembly, I assume it is a toolchain bug. I seem to recall seeing or hearing about such issues in the past. I'm sure somebody else will sort it out soon enough. |
I can reproduce this bug. Investigating. EDIT: I think this is actually on our end. Let me contemplate a fix. |
If |
@devinamatthews Yes. I'm thinking through the appropriate |
Note also that you should probably use AVX2 compiler flags for SKX everywhere. Using 512b SIMD only pays off if the SIMD efficiency is very good and doesn't make sense for code that isn't compute-bound (e.g. BLAS1). The code in BLIS that clearly benefits from 512b SIMD is written in assembly and thus is unaffected by compiler flags. |
@jeffhammond Thanks for this suggestion. So you're saying that we can compile reference kernel code with AVX2 on skx, no questions asked, and the outcome will always be better than with AVX512? |
@fgvanzee Yeah, reference/unoptimized code should be compiled to AVX2, because that code is not going to achieve high enough SIMD utilization to overcome the AVX512 frequency drop. Note that this doesn't mean AVX-512 isn't useful - it is certainly useful in BLAS3*, but I assume you are not using reference code there.
|
@jeffhammond You're making me think that we need to begin tracking a new sub-set of CFLAGS, namely, those for reference kernels. So far, we only track CFLAGS for kernels. I'll open an issue. |
@loveshack Try d39fa1c. It hopefully fixes the issue with KNL executing the illegal AVX512VL instruction, but haven't had time to try it. I'll continue working on this tomorrow if needed. |
Yes, it's now OK on make check in default mode, but with
--enable-cblas -b 64 --enable-debug x86_64
I get a segv from ./obj/x86_64/blastest/cblat1.x during make check:
```
Core was generated by `./obj/x86_64/blastest/cblat1.x'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000696da4 in bli_cdotv_knl_ref (conjx=BLIS_CONJUGATE,
conjy=BLIS_NO_CONJUGATE, n=2, x=0x800c4f490200, incx=2, y=0x7ffc4f4901d0,
incy=-2, rho=0x7ffc4f490160, cntx=0x2991870)
at ref_kernels/1/bli_dotv_ref.c:123
123 INSERT_GENTFUNC_BASIC2( dotv, BLIS_CNAME_INFIX, BLIS_REF_SUFFIX )
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 libgcc-4.8.5-16.el7_4.2.x86_64 memkind-1.7.0-1.el7.centos.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64
(gdb) bt
#0 0x0000000000696da4 in bli_cdotv_knl_ref (conjx=BLIS_CONJUGATE,
conjy=BLIS_NO_CONJUGATE, n=2, x=0x800c4f490200, incx=2, y=0x7ffc4f4901d0,
incy=-2, rho=0x7ffc4f490160, cntx=0x2991870)
at ref_kernels/1/bli_dotv_ref.c:123
#1 0x0000000000431277 in bli_cdotv (conjx=BLIS_CONJUGATE,
conjy=BLIS_NO_CONJUGATE, n=2, x=0x800c4f490200, incx=2, y=0x7ffc4f4901d0,
incy=-2, rho=0x7ffc4f490160, cntx=0x2991870) at frame/1/bli_l1v_tapi.c:220
#2 0x00000000004154f9 in cdotc_ (n=0x1df5ac4 <combla_+4>, x=0x7ffc4f490210,
incx=0x1df5ac8 <combla_+8>, y=0x7ffc4f4901d0, incy=0x1df5acc <combla_+12>)
at frame/compat/bla_dot.c:88
#3 0x0000000000402528 in check2_ ()
#4 0x0000000000401b41 in main ()
```
With make -k I get segvs from other tests and also errors like
```
Running dblat3.x < './blastest/input/dblat3.in' (output to 'out.dblat3')
libblis: frame/base/check/bli_obj_check.c (line 106):
libblis: Encountered negative dimension.
libblis: Aborting.
```
|
@loveshack Thanks Dave. The segfault is simply due to the fact that we currently fail to account for non-32-bit integers in the BLAS test drivers. I'll see if I can come up with a fix. |
@loveshack I think I've fixed this now in b549b91. The BLAS test drivers now define its
and it seems to work. |
As @loveshack suggested, we should add
skx
andknl
to thex86_64
family. The only problem is, what happens if the compiler doesn't know about-march=knl
or-march=skylake-avx512
? For example, Ubunutu 16.04 (presumably a popular distribution of Linux) is on gcc 5.4.0, which knows aboutknl
but notskylake-avx512
.The text was updated successfully, but these errors were encountered: