Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ThreadSanitizer: data race /home/mlx/mlx/array.h:383 in mlx::core::array::set_status(mlx::core::array::Status) const #1707

Closed
JohnSmithBH84 opened this issue Dec 16, 2024 · 2 comments · Fixed by #1719

Comments

@JohnSmithBH84
Copy link

Describe the bug
I’ve build mlx with Thread Satitizer (TSan) and it detected a data race in array.h after launching linear_regression. I'm not sure whether this is an actual issue, or whether mlx' code is too sophisticated for TSan to understand. Do you have any idea?

To Reproduce
mlx: (main 50f3535)
Added the next rows into CMakeLists.txt to build with Thread Sanitizer.

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=thread -fno-omit-frame-pointer")
set(CMAKE_LINKER_FLAGS "${CMAKE_LINKER_FLAGS} -fsanitize=thread -fno-omit-frame-pointer")
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

Configured and build the project.

mkdir build & cd build
cmake -DCMAKE_BUILD_TYPE=DEBUG -Wno-dev ..
cmake --build . -- -j16

Launched executable without arguments.
./build/examples/cpp/linear_regression

Expected behavior
No data race reported by TSan

Desktop (please complete the following information):
Ubuntu 22.04.4 LTS
Linux 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
clang: 19.1.0 (x86_64-unknown-linux-gnu)

Additional context

==================
WARNING: ThreadSanitizer: data race (pid=1977)
  Write of size 4 at 0x7b4000008260 by thread T1:
    #0 mlx::core::array::set_status(mlx::core::array::Status) const /home/mlx/mlx/array.h:383 (linear_regression+0x2dd90)
    #1 operator() /home/mlx/mlx/transforms.cpp:228 (linear_regression+0x1566e1)
    #2 __invoke_impl<void, mlx::core::eval_impl(std::vector<array>, bool)::<lambda()>&> /usr/include/c++/12/bits/invoke.h:61 (linear_regression+0x16b68f)
    #3 __invoke_r<void, mlx::core::eval_impl(std::vector<array>, bool)::<lambda()>&> /usr/include/c++/12/bits/invoke.h:111 (linear_regression+0x16b4d0)
    #4 _M_invoke /usr/include/c++/12/bits/std_function.h:290 (linear_regression+0x16b258)
    #5 std::function<void ()>::operator()() const /usr/include/c++/12/bits/std_function.h:591 (linear_regression+0x14c2f4)
    #6 mlx::core::scheduler::StreamThread::thread_fn() <null> (linear_regression+0x14a8b1)
    #7 void std::__invoke_impl<void, void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*>(std::__invoke_memfun_deref, void (mlx::core::scheduler::StreamThread::*&&)(), mlx::core::scheduler::StreamThread*&&) /usr/include/c++/12/bits/invoke.h:74 (linear_regression+0x156451)
    #8 std::__invoke_result<void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*>::type std::__invoke<void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*>(void (mlx::core::scheduler::StreamThread::*&&)(), mlx::core::scheduler::StreamThread*&&) /usr/include/c++/12/bits/invoke.h:96 (linear_regression+0x15632e)
    #9 void std::thread::_Invoker<std::tuple<void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/12/bits/std_thread.h:252 (linear_regression+0x15623e)
    #10 std::thread::_Invoker<std::tuple<void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*> >::operator()() <null> (linear_regression+0x156156)
    #11 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*> > >::_M_run() <null> (linear_regression+0x156040)
    #12 <null> <null> (libstdc++.so.6+0xd44a2)

  Previous read of size 4 at 0x7b4000008260 by main thread:
    #0 mlx::core::array::status() const /home/mlx/mlx/array.h:379 (linear_regression+0x2dd43)
    #1 mlx::core::array::is_available() const /home/mlx/mlx/array.cpp:96 (linear_regression+0x2b77c)
    #2 mlx::core::array::wait() /home/mlx/mlx/array.cpp:106 (linear_regression+0x2b82a)
    #3 mlx::core::eval(std::vector<mlx::core::array, std::allocator<mlx::core::array> >) /home/mlx/mlx/transforms.cpp:271 (linear_regression+0x158134)
    #4 void mlx::core::eval<mlx::core::array&, void>(mlx::core::array&) <null> (linear_regression+0x20aaa)
    #5 main /home/mlx/examples/cpp/linear_regression.cpp:45 (linear_regression+0x1daa0)

  Location is heap block of size 248 at 0x7b4000008200 allocated by main thread:
    #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.2+0x87323)
    #1 std::__new_allocator<std::_Sp_counted_ptr_inplace<mlx::core::array::ArrayDesc, std::allocator<void>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) /usr/include/c++/12/bits/new_allocator.h:137 (linear_regression+0x29596)
    #2 std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<mlx::core::array::ArrayDesc, std::allocator<void>, (__gnu_cxx::_Lock_policy)2> > >::allocate(std::allocator<std::_Sp_counted_ptr_inplace<mlx::core::array::ArrayDesc, std::allocator<void>, (__gnu_cxx::_Lock_policy)2> >&, unsigned long) /usr/include/c++/12/bits/alloc_traits.h:464 (linear_regression+0x28617)
    #3 std::__allocated_ptr<std::allocator<std::_Sp_counted_ptr_inplace<mlx::core::array::ArrayDesc, std::allocator<void>, (__gnu_cxx::_Lock_policy)2> > > std::__allocate_guarded<std::allocator<std::_Sp_counted_ptr_inplace<mlx::core::array::ArrayDesc, std::allocator<void>, (__gnu_cxx::_Lock_policy)2> > >(std::allocator<std::_Sp_counted_ptr_inplace<mlx::core::array::ArrayDesc, std::allocator<void>, (__gnu_cxx::_Lock_policy)2> >&) /usr/include/c++/12/bits/allocated_ptr.h:98 (linear_regression+0x276d1)
    #4 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<mlx::core::array::ArrayDesc, std::allocator<void>, std::vector<int, std::allocator<int> >, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>, std::vector<mlx::core::array, std::allocator<mlx::core::array> > >(mlx::core::array::ArrayDesc*&, std::_Sp_alloc_shared_tag<std::allocator<void> >, std::vector<int, std::allocator<int> >&&, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>&&, std::vector<mlx::core::array, std::allocator<mlx::core::array> >&&) /usr/include/c++/12/bits/shared_ptr_base.h:969 (linear_regression+0x3919e)
    #5 std::__shared_ptr<mlx::core::array::ArrayDesc, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, std::vector<int, std::allocator<int> >, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>, std::vector<mlx::core::array, std::allocator<mlx::core::array> > >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::vector<int, std::allocator<int> >&&, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>&&, std::vector<mlx::core::array, std::allocator<mlx::core::array> >&&) /usr/include/c++/12/bits/shared_ptr_base.h:1712 (linear_regression+0x35e8b)
    #6 std::shared_ptr<mlx::core::array::ArrayDesc>::shared_ptr<std::allocator<void>, std::vector<int, std::allocator<int> >, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>, std::vector<mlx::core::array, std::allocator<mlx::core::array> > >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::vector<int, std::allocator<int> >&&, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>&&, std::vector<mlx::core::array, std::allocator<mlx::core::array> >&&) <null> (linear_regression+0x32336)
    #7 std::shared_ptr<std::enable_if<!std::is_array<mlx::core::array::ArrayDesc>::value, mlx::core::array::ArrayDesc>::type> std::make_shared<mlx::core::array::ArrayDesc, std::vector<int, std::allocator<int> >, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>, std::vector<mlx::core::array, std::allocator<mlx::core::array> > >(std::vector<int, std::allocator<int> >&&, mlx::core::Dtype&, std::shared_ptr<mlx::core::Primitive>&&, std::vector<mlx::core::array, std::allocator<mlx::core::array> >&&) /usr/include/c++/12/bits/shared_ptr.h:1010 (linear_regression+0x2f2b7)
    #8 mlx::core::array::array(std::vector<int, std::allocator<int> >, mlx::core::Dtype, std::shared_ptr<mlx::core::Primitive>, std::vector<mlx::core::array, std::allocator<mlx::core::array> >) /home/mlx/mlx/array.cpp:38 (linear_regression+0x2b04d)
    #9 mlx::core::eval_impl(std::vector<mlx::core::array, std::allocator<mlx::core::array> >, bool) /home/mlx/mlx/transforms.cpp:61 (linear_regression+0x1569eb)
    #10 mlx::core::eval(std::vector<mlx::core::array, std::allocator<mlx::core::array> >) /home/mlx/mlx/transforms.cpp:271 (linear_regression+0x158128)
    #11 void mlx::core::eval<mlx::core::array&, void>(mlx::core::array&) <null> (linear_regression+0x20aaa)
    #12 main /home/mlx/examples/cpp/linear_regression.cpp:45 (linear_regression+0x1daa0)

  Thread T1 (tid=1979, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x5e686)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xd4578)
    #2 mlx::core::scheduler::StreamThread::StreamThread(mlx::core::Stream) <null> (linear_regression+0x14a5ca)
    #3 mlx::core::scheduler::Scheduler::new_stream(mlx::core::Device const&) <null> (linear_regression+0x14adc0)
    #4 mlx::core::scheduler::Scheduler::Scheduler() <null> (linear_regression+0x14acc1)
    #5 mlx::core::scheduler::scheduler() /home/mlx/mlx/scheduler.cpp:55 (linear_regression+0x14839b)
    #6 mlx::core::default_stream(mlx::core::Device) /home/mlx/mlx/scheduler.cpp:13 (linear_regression+0x147dd4)
    #7 mlx::core::to_stream(std::variant<std::monostate, mlx::core::Stream, mlx::core::Device>) /home/mlx/mlx/utils.cpp:13 (linear_regression+0x185459)
    #8 mlx::core::random::normal(std::vector<int, std::allocator<int> > const&, mlx::core::Dtype, float, float, std::optional<mlx::core::array> const&, std::variant<std::monostate, mlx::core::Stream, mlx::core::Device>) /home/mlx/mlx/random.cpp:186 (linear_regression+0x1393af)
    #9 mlx::core::random::normal(std::vector<int, std::allocator<int> > const&, std::optional<mlx::core::array> const&, std::variant<std::monostate, mlx::core::Stream, mlx::core::Device>) <null> (linear_regression+0x1f443)
    #10 main /home/mlx/examples/cpp/linear_regression.cpp:22 (linear_regression+0x1d4f7)

SUMMARY: ThreadSanitizer: data race /home/mlx/mlx/array.h:383 in mlx::core::array::set_status(mlx::core::array::Status) const
==================
ThreadSanitizer: reported 1 warnings
@awni
Copy link
Member

awni commented Dec 18, 2024

I think it is a legitimate race actually. I don't know if it ever causes issues in practice.. but probably we should fix it to be safe. Thanks for flagging!

@awni awni mentioned this issue Dec 18, 2024
@awni
Copy link
Member

awni commented Dec 18, 2024

#1719 resolves this. It runs clean for the linear regression example with the CPU build. It seems that the thread sanitizer doesn't understand metal events so I wouldn't use it for the GPU build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants