Skip to content

Matmul correctness check failed #235

Closed
@WangJialei-A

Description

@WangJialei-A

Please check the ci result
attempt 1
attempt 2

Please use branch wangjial/benchgc_op and see https://github.com/intel/graph-compiler/blob/wangjial/benchgc_op/scripts/correctness.sh for details

batch_matmul fail:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ghrunner/.local/lib/python3.10/site-packages/benchgc/__main__.py", line 262, in <module>
    engine = compiler.compile_and_jit(module)
  File "/home/ghrunner/.local/lib/python3.10/site-packages/gc_mlir/graph_compiler.py", line 47, in compile_and_jit
    self.compile(module, ir_printing)
  File "/home/ghrunner/.local/lib/python3.10/site-packages/gc_mlir/graph_compiler.py", line 37, in compile
    pm.run(module.operation)
gc_mlir._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: unknown: 'linalg.transpose' op dim(result, 0) = 4 doesn't match dim(input, permutation[0]) = 64
 note: unknown: see current operation: 
  %15 = "linalg.transpose"(%10, %14) <{permutation = array<i64: 1, 0, 2>}> ({
  ^bb0(%arg9: f32, %arg10: f32):
    "linalg.yield"(%arg9) : (f32) -> ()
  }) : (tensor<16x64x64xf32>, tensor<4x16x16xf32>) -> tensor<4x16x16xf32>

matmul fail

              (0, 4): ref: 72393.0000000 res: -35198.0312500 abs_diff: 107591.0312500 rel_diff:    1.4862076
              (0, 8): ref: 3704.0000000 res: 307348073856968022527436828704768.0000000 abs_diff: 307348073856968022527436828704768.0000000 rel_diff: 82977343712344206321601478656.0000000
              (0, 9): ref: 52705.0000000 res: 53402.8203125 abs_diff:  697.8203125 rel_diff:    0.0132401
             (0, 10): ref: -62068.0000000 res: 4360480028519042306210267136.0000000 abs_diff: 4360480028519042306210267136.0000000 rel_diff: 70253271883218220482560.0000000
             (0, 11): ref: -30467.0000000 res: 15272181609856882034956304384.0000000 abs_diff: 15272181609856882034956304384.0000000 rel_diff: 501269625702365198811136.0000000
             (0, 12): ref: 11023.0000000 res: 75022866827070021312876380160.0000000 abs_diff: 75022866827070021312876380160.0000000 rel_diff: 6806029988930553684951040.0000000
             (0, 13): ref: 1945.0000000 res: 17751089905934782744747835392.0000000 abs_diff: 17751089905934782744747835392.0000000 rel_diff: 9126524324624791448322048.0000000
             (0, 20): ref: -20481.0000000 res:          nan abs_diff:          nan rel_diff:          nan
             (0, 21): ref: 19575.0000000 res:          nan abs_diff:          nan rel_diff:          nan
             (0, 22): ref: 35530.0000000 res:          nan abs_diff:          nan rel_diff:          nan
FAIL: linalg.matmul

Both issue is not reproducible under by environment.
The failure of matmul is random, may be related to memory corruption

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions