Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test segment fault on Intel Knight Landing #116

Closed
heroxbd opened this issue Feb 20, 2017 · 12 comments
Closed

test segment fault on Intel Knight Landing #116

heroxbd opened this issue Feb 20, 2017 · 12 comments
Assignees

Comments

@heroxbd
Copy link

heroxbd commented Feb 20, 2017

I configured blis by

./configure --enable-shared \
                --enable-verbose-make \
                --int-size=32 \
                --blas-int-size=32 \
                --enable-threading=openmp \
                --enable-blas --enable-cblas auto

and run tests by make test.

On Intel broadwell it works, but fails by segment fault on Intel Knight Landing:

gcc ./obj/knl/testsuite/test_axpbyv.o ./obj/knl/testsuite/test_ger.o ./obj/knl/testsuite/test_trsm.o ./obj/knl/testsuite/test_syr.o ./obj/knl/testsuite/test_axpyf.o ./obj/knl/testsuite/test_scal2m.o ./obj/
knl/testsuite/test_her2.o ./obj/knl/testsuite/test_trmm.o ./obj/knl/testsuite/test_setm.o ./obj/knl/testsuite/test_copym.o ./obj/knl/testsuite/test_symv.o ./obj/knl/testsuite/test_axpym.o ./obj/knl/testsuite/test_her.o ./obj/knl/testsuite/test_subm.o ./obj/knl/testsuite/test_subv.o ./obj/knl/testsuite/test_trmm3.o ./obj/knl/testsuite/test_scalm.o ./obj/knl/testsuite/test_copyv.o ./obj/knl/testsuite/test_syr2k.o ./obj/knl/testsuite/test_normfm.o ./obj/knl/testsuite/test_axpy2v.o ./obj/knl/testsuite/test_axpyv.o ./obj/knl/testsuite/test_gemmtrsm_ukr.o ./obj/knl/testsuite/test_syrk.o ./obj/knl/testsuite/test_amaxv.o ./obj/knl/testsuite/test_randv.o ./obj/knl/testsuite/test_her2k.o ./obj/knl/testsuite/test_trsv.o ./obj/knl/testsuite/test_dotv.o ./obj/knl/testsuite/test_hemm.o ./obj/knl/testsuite/test_dotxaxpyf.o ./obj/knl/testsuite/test_scal2v.o ./obj/knl/testsuite/test_symm.o ./obj/knl/testsuite/test_trmv.o ./obj/knl/testsuite/test_normfv.o ./obj/knl/testsuite/test_setv.o ./obj/knl/testsuite/test_addm.o ./obj/knl/testsuite/test_xpbyv.o ./obj/knl/testsuite/test_gemv.o ./obj/knl/testsuite/test_libblis.o ./obj/knl/testsuite/test_dotaxpyv.o ./obj/knl/testsuite/test_randm.o ./obj/knl/testsuite/test_gemm_ukr.o ./obj/knl/testsuite/test_addv.o ./obj/knl/testsuite/test_herk.o ./obj/knl/testsuite/test_gemm.o ./obj/knl/testsuite/test_dotxv.o ./obj/knl/testsuite/test_dotxf.o ./obj/knl/testsuite/test_trsm_ukr.o ./obj/knl/testsuite/test_syr2.o ./obj/knl/testsuite/test_hemv.o ./obj/knl/testsuite/test_scalv.o ./lib/knl/libblis.a -lm -lmemkind -fopenmp -lrt -o test_libblis.x
./test_libblis.x -g ./testsuite/input.general
-o ./testsuite/input.operations
> output.testsuite
/work/0/gh60/share/knl/bin/sh: line 2: 31619 Segmentation fault ./test_libblis.x -g ./testsuite/input.general -o ./testsuite/input.operations > output.testsuite

The log file is attached. My system is Gentoo with gcc-5.4.

build.log.gz

@devinamatthews
Copy link
Member

This is fixed by #117, which I will merge as soon as the tests pass. Note that you can also use icc for KNL (and other Intel archs) with ./configure CC=icc ....

@heroxbd
Copy link
Author

heroxbd commented Feb 20, 2017

Hi Devin, thank you for your work. But unfortunately you commit 7d42fc0 did not fix this bug.
The segment fault persists.

CI failed for KNL, too.

https://travis-ci.org/flame/blis/builds/203300388

@devinamatthews
Copy link
Member

The KNL CI failure is a red herring, it fails always for other technical reasons at the moment (not the least of which is that Travis doesn't have KNL nodes). I was able to compile and run the testsuite successfully using your configure line with gcc 6.2 and CentOS 7.2 on a Xeon Phi 7210. Could you please send a gdb backtrace of the failure as well as a disassembly of the region where it fails?

@heroxbd
Copy link
Author

heroxbd commented Feb 20, 2017

Sure. I am installing gcc-6.3 to give it another shot.

blis_strsm_ruhu_cc                 300   300    3.290   2.40e-08   PASS
blis_strsm_ruhu_cc                 400   400    4.248   3.13e-08   PASS

% blis_<dt><op>_<params>_<stor>      m     n   gflops   resid      result

Program received signal SIGSEGV, Segmentation fault.
0x000000000048736f in TAIL_LOOP ()
(gdb) bt
#0  0x000000000048736f in TAIL_LOOP ()
#1  0x00000000004f4299 in bli_dgemmtrsm_l_ukr_ref ()
#2  0x000000000050f80d in bli_dtrsm_ll_ker_var2 ()
#3  0x0000000000511c03 in bli_trsm_ll_ker_var2 ()
#4  0x00000000004f15f6 in bli_trsm_int ()
#5  0x00000000004f179a in bli_trsm_packa ()
#6  0x00000000004f15f6 in bli_trsm_int ()
#7  0x000000000050df2c in bli_trsm_blk_var1 ()
#8  0x00000000004f15f6 in bli_trsm_int ()
#9  0x00000000004f184d in bli_trsm_packb ()
#10 0x00000000004f15f6 in bli_trsm_int ()
#11 0x000000000050e248 in bli_trsm_blk_var3 ()
#12 0x00000000004f15f6 in bli_trsm_int ()
#13 0x000000000050e0bc in bli_trsm_blk_var2 ()
#14 0x00000000004f15f6 in bli_trsm_int ()
#15 0x00000000004779d6 in bli_l3_thread_decorator._omp_fn.0 ()
#16 0x00007ffff746ddc3 in GOMP_parallel () from /work/0/gh60/share/knl/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libgomp.so.1
#17 0x0000000000477bcb in bli_l3_thread_decorator ()
#18 0x00000000004f0fd6 in bli_trsm_front ()
#19 0x000000000046898a in bli_trsmnat ()
#20 0x00000000004028c9 in libblis_test_trsm_experiment ()
#21 0x00000000004150d4 in libblis_test_op_driver ()
#22 0x000000000040234d in libblis_test_trsm ()
#23 0x00000000004014aa in main ()

(gdb) disassemble 0x000000000048734f, 0x000000000048738f
Dump of assembler code from 0x48734f to 0x48738f:
   0x000000000048734f <TAIL_LOOP+43>:   pop    %rax
   0x0000000000487350 <TAIL_LOOP+44>:   mov    $0x72620358,%eax
   0x0000000000487355 <TAIL_LOOP+49>:   std    
   0x0000000000487356 <TAIL_LOOP+50>:   pop    %rax
   0x0000000000487357 <TAIL_LOOP+51>:   mov    $0x72620460,%eax
   0x000000000048735c <TAIL_LOOP+56>:   std    
   0x000000000048735d <TAIL_LOOP+57>:   pop    %rax
   0x000000000048735e <TAIL_LOOP+58>:   mov    $0x72620568,%eax
   0x0000000000487363 <TAIL_LOOP+63>:   std    
   0x0000000000487364 <TAIL_LOOP+64>:   pop    %rax
   0x0000000000487365 <TAIL_LOOP+65>:   mov    $0x72620670,%eax
   0x000000000048736a <TAIL_LOOP+70>:   std    
   0x000000000048736b <TAIL_LOOP+71>:   pop    %rax
   0x000000000048736c <TAIL_LOOP+72>:   mov    $0xe2620778,%eax
   0x0000000000487371 <TAIL_LOOP+77>:   std    
   0x0000000000487372 <TAIL_LOOP+78>:   pop    %rax
   0x0000000000487373 <TAIL_LOOP+79>:   mov    $0x180f0840,%eax
   0x0000000000487378 <TAIL_LOOP+84>:   mov    %al,%al
   0x000000000048737a <TAIL_LOOP+86>:   or     $0xe2620000,%eax
   0x000000000048737f <TAIL_LOOP+91>:   std    
   0x0000000000487380 <TAIL_LOOP+92>:   pop    %rax
   0x0000000000487381 <TAIL_LOOP+93>:   mov    $0xe2620948,%eax
   0x0000000000487386 <TAIL_LOOP+98>:   std    
   0x0000000000487387 <TAIL_LOOP+99>:   pop    %rax
   0x0000000000487388 <TAIL_LOOP+100>:  mov    $0xe2620a50,%eax
   0x000000000048738d <TAIL_LOOP+105>:  std    
   0x000000000048738e <TAIL_LOOP+106>:  pop    %rax
End of assembler dump.

@devinamatthews
Copy link
Member

Oops, I had the trsm tests turned off, my bad. However, I did reproduce a different segfault which #117 fixed. This new one is fixed by #118, and I remembered to run the full testsuite this time. I'll keep the issue open until you confirm.

@jeffhammond
Copy link
Member

@devinamatthews I am not suggesting that it is a practical solution, but you can make Travis CI execute code for a wide range of Intel processors using SDE.

The public version of SDE supports most if not all of the publicly documented Intel instructions (Intel ISA extensions):

     -quark              Set chip-check and CPUID for Intel(R) Quark
     -p4                 Set chip-check and CPUID for Pentium4
     -p4p                Set chip-check and CPUID for Pentium4 Prescott
     -mrm                Set chip-check and CPUID for Merom
     -pnr                Set chip-check and CPUID for Penryn
     -nhm                Set chip-check and CPUID for Nehalem
     -wsm                Set chip-check and CPUID for Westmere
     -snb                Set chip-check and CPUID for Sandy Bridge
     -ivb                Set chip-check and CPUID for Ivy Bridge
     -hsw                Set chip-check and CPUID for Haswell
     -bdw                Set chip-check and CPUID for Broadwell
     -skx                Set chip-check and CPUID for Skylake Server
     -skl                Set chip-check and CPUID for Skylake Client
     -cnl                Set chip-check and CPUID for Cannonlake
     -knl                Set chip-check and CPUID for Knights Landing
     -slt                Set chip-check and CPUID for Saltwell
     -slm                Set chip-check and CPUID for Silvermont
     -glm                Set chip-check and CPUID for Goldmont

I will enqueue a low priority task to see if I can install and run SDE from within Travis.

@devinamatthews
Copy link
Member

Yes, maybe this would be possible, but I imagine we would run into problems with the build time limit.

@jeffhammond
Copy link
Member

Yeah, my understanding is that SDE emulation overhead is proportional to the percentage of instructions that need to be emulated and thus something like BLIS is going to run quite a bit slower when emulating AVX-512.

I don't really know how Travis works on the back-end, but perhaps some day we will be able to request AWS C5 instances to get AVX-512 support, which would be the best available approximation to KNL.

@heroxbd
Copy link
Author

heroxbd commented Feb 21, 2017

Thanks Devin! it has all the tests passed, and R is built against BLIS successfully.

@heroxbd heroxbd closed this as completed Feb 21, 2017
@fgvanzee
Copy link
Member

Thanks for working on this, Devin.
@heroxbd I'm glad we were able to help you out, and I hope you continue to stay involved in BLIS.

@heroxbd
Copy link
Author

heroxbd commented Feb 21, 2017

Sure, Field. I don't think MKL suites my scientific needs to share results with colleagues and the public.

I was introduced by Jeff from the OpenBLAS discussion. BLIS provided the only viable option on KNL, whose ecosystem is at present dominated by blackbox toolchains.

Thanks again. Keep on this great work. I will give flame a try the other day for it is able to use the advanced API of BLIS compared to the reference lapack I am using.

@jeffhammond
Copy link
Member

@heroxbd I assume your concern with MKL is that, because it is closed-source, it isn't fully reproducible and/or verifiable? I'm not in any way trying to talk you out of your principles, which I admire - I merely want to make sure I understand the concern precisely.

BLIS is - as far as I know - the only open-source BLAS implementation that supports AVX-512 and I'm very happy that you are using it. While the proprietary Intel toolchain meets the needs of many users, it is important both to me and to others at Intel that there are high-quality open-source alternatives across the board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants