Closed
Description
Summary
The recent change introduced by a8b478b causes a failure on aarch64.
Version
oneDNN v3.7.0 (commit a8b478b)
Environment
- CPU: Neoverse V1 (failure also observed on Neoverse N1)
flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng
- OS version:
linux-6.5.0 22.04.1-Ubuntu
- Compiler version:
gcc-10, g++10
- CMake version:
3.22.1
- CMake build command:
CXX=g++-10 CC=gcc-10 cmake .. -DCMAKE_BUILD_TYPE=Release -DDNNL_AARCH64_USE_ACL=1 -DDNNL_BUILD_FOR_CI=ON -DDNNL_TEST_SET=NIGHTLY -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DONEDNN_BUILD_GRAPH=0 -DDNNL_ENABLE_JIT_PROFILING=0 -DDNNL_OMP_RUNTIME=1
- git hash: a8b478b
Steps to reproduce
Failure only appears in Release build.
ONEDNN_VERBOSE=all ./build/tests/benchdnn/benchdnn --matmul --skip-impl=ref --dt=s8:s8:f32 --stag=ab --wtag=ab --dtag=ab --bia_dt=u8 --attr-scales=src:common:0.25+dst:common:2.25+wei:common:0.5 --attr-zero-points=src:common:1+dst:common:2+wei:common:-1 --attr-post-ops=sum 1x30:30x20
Observed behavior
onednn_verbose,v1,info,oneDNN v3.7.0 (commit a8b478b21f7240caa4d68d2b5aee88b54bbd3092)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:32
onednn_verbose,v1,info,cpu,isa:AArch64 SVE (256 bits)
onednn_verbose,v1,info,gpu,runtime:none
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,brg:sve_512,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported isa,src/cpu/aarch64/matmul/brgemm_matmul.cpp:98
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,lowp_gemm:acl,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,scale and zero-point for f32 dst unsupported,src/cpu/aarch64/matmul/acl_lowp_matmul.cpp:90
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:acl,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/aarch64/matmul/acl_matmul.cpp:84
onednn_verbose,v1,primitive,create:dispatch,brgemm_matmul,datatype configuration not supported on this isa,src/cpu/aarch64/matmul/brgemm_matmul_utils.cpp:735
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:jit:f32,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/matmul/gemm_f32_matmul.cpp:93
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:jit:bf16,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/matmul/gemm_bf16_matmul.cpp:63
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:jit:bf16,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/matmul/gemm_bf16_matmul.cpp:63
onednn_verbose,v1,primitive,create:cache_miss,cpu,matmul,gemm:jit,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,0.0268555
onednn_verbose,v1,primitive,create:cache_hit,cpu,matmul,gemm:jit,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,0.00195312
onednn_verbose,v1,primitive,create:check,matmul,unsupported attribute,src/common/matmul.cpp:75
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.0151367
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.720947
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:u8::blocked:ab::f0,,,1x20,0.0959473
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:u8::blocked:ab::f0,,,1x20,0.00390625
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,30x20,0.0600586
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,30x20,0.000976562
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,1x30,0.032959
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,1x30,0
onednn_verbose,v1,primitive,exec,cpu,matmul,gemm:jit,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,1.21704
onednn_verbose,v1,primitive,create:cache_hit,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.00195312
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.078125
onednn_verbose,v1,primitive,create:cache_hit,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.00195312
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.236084
[ 6][DST][0:6] exp_f32:-1.49012e-08 exp:-1.49012e-08 got: 0 diff:1.49012e-08 rdiff: 1
[COMPARE_STATS][DST]: trh=0 err_max_diff:1.49012e-08 err_max_rdiff: 1 all_max_diff:4.76837e-07 all_max_rdiff: 1
0:FAILED (errors:1 total:20) __REPRO: --matmul --skip-impl=ref --dt=s8:s8:f32 --stag=ab --wtag=ab --dtag=ab --bia_dt=u8 --attr-scales=src:common:0.25+dst:common:2.25+wei:common:0.5 --attr-zero-points=src:common:1+dst:common:2+wei:common:-1 --attr-post-ops=sum 1x30:30x20
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.01s; fill: 0.00s (6%); compute_ref: 0.00s (5%); compare: 0.00s (9%);
Expected behavior
The returned value of 0
seems reasonably close to the expected 1.49012e-08
. Could you share the rational behind changing the threshold? Thank you.
Activity