Skip to content

bf16 MLP failed with memory pool enabled #278

Closed
@yifeizh2

Description

@yifeizh2

DLRM bot workload (LD_PRELOAD=/opt/miniforge3/lib/libiomp5.so:/home/yifei/ipex_env/gperftools-2.7.90/.libs/libtcmalloc.so numactl -C 0-55 -m 0 python3 ./tools/main.py --driver=mlp --batch_size=128 --hidden_size_list=13x512x256x128 --has_bias=512x256x128 --act_type=relu --dtype=bf16) failed with the following error message

python3: ../lib/gc/ExecutionEngine/CPURuntime/MemoryPool.cpp:215: void {anonymous}::FILOMemoryPool::dealloc(void*): Assertion `current->allocated > chunk->size' failed.
bench_mlp.sh: line 9: 1393883 Aborted                 (core dumped) LD_PRELOAD=/opt/miniforge3/lib/libiomp5.so:/home/yifei/ipex_env/gperftools-2.7.90/.libs/libtcmalloc.so numactl -C 0-55 -m 0 python3 ./tools/main.py --driver=mlp --batch_size=128 --hidden_size_list=13x512x256x128 --has_bias=512x256x128 --act_type=relu --dtype=bf16 -p

More analysis shall be performed. Current observation is that running any one of the layers alone will not trigger the above-mentioned issue.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions