-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory error in devArrMatch #229
Conversation
Note to myself: |
I updated the PR to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending CI.
@gpucibot merge |
Related: rapidsai/raft#229 We recently discovered a memory error in the `devArrMatch()` function: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci-v0.20/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu,OS=ubuntu16.04,PYTHON=3.7/161/console ``` 13:09:34 [----------] 44 tests from FilTests/TreeliteDenseFilTest 13:09:34 [ RUN ] FilTests/TreeliteDenseFilTest.Import/0 13:09:34 *** Error in `./test/ml': free(): invalid pointer: 0x00007f632b691fe8 *** 13:09:34 ======= Backtrace: ========= 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f632b3447f5] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x8038a)[0x7f632b34d38a] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f632b35158c] 13:09:34 /workspace/ci/artifacts/cuml/cpu/conda_work/cpp/build/libcuml++.so(_ZN8treelite9ModelImplIffED0Ev+0xf5)[0x7f632c556405] ``` which was traced to the `devArrMatch()` function as follows: ``` $ valgrind ./cpp/build/test/ml --gtest_filter=FilTests/TreeliteDenseFilTest.Import/0 ==6398== Mismatched free() / delete / delete [] ==6398== at 0x483D1CF: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x209287: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC253: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ==6398== Address 0x232bfa860 is 0 bytes inside a block of size 160,000 alloc'd ==6398== at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x2ABFF8: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in/home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ``` **Diagnosis**. The `devArrMatch` functions are allocating a temporary buffer using `new[]` and then assigning it to a `shared_ptr<T>`. This is not valid because the destructor of `shared_ptr<T>` will invoke `delete`, not `delete[]`. Calling `delete` with a buffer allocated by `new[]` leads to an undefined behavior. See https://docs.microsoft.com/en-us/cpp/code-quality/c6278?view=msvc-160. **Proposed fix**. Use `std:unique_ptr<T[]>` instead to store temporary buffers. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #3860
@dantegd @cjnolet How do we bring in the latest RAFT into cuML? rapidsai/cuml#3855 is blocked by the memory error. |
@hcho3, replace the git commit hash for RAFT in |
Thanks! |
Related: rapidsai/raft#229 We recently discovered a memory error in the `devArrMatch()` function: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci-v0.20/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu,OS=ubuntu16.04,PYTHON=3.7/161/console ``` 13:09:34 [----------] 44 tests from FilTests/TreeliteDenseFilTest 13:09:34 [ RUN ] FilTests/TreeliteDenseFilTest.Import/0 13:09:34 *** Error in `./test/ml': free(): invalid pointer: 0x00007f632b691fe8 *** 13:09:34 ======= Backtrace: ========= 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f632b3447f5] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x8038a)[0x7f632b34d38a] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f632b35158c] 13:09:34 /workspace/ci/artifacts/cuml/cpu/conda_work/cpp/build/libcuml++.so(_ZN8treelite9ModelImplIffED0Ev+0xf5)[0x7f632c556405] ``` which was traced to the `devArrMatch()` function as follows: ``` $ valgrind ./cpp/build/test/ml --gtest_filter=FilTests/TreeliteDenseFilTest.Import/0 ==6398== Mismatched free() / delete / delete [] ==6398== at 0x483D1CF: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x209287: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC253: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ==6398== Address 0x232bfa860 is 0 bytes inside a block of size 160,000 alloc'd ==6398== at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x2ABFF8: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in/home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ``` **Diagnosis**. The `devArrMatch` functions are allocating a temporary buffer using `new[]` and then assigning it to a `shared_ptr<T>`. This is not valid because the destructor of `shared_ptr<T>` will invoke `delete`, not `delete[]`. Calling `delete` with a buffer allocated by `new[]` leads to an undefined behavior. See https://docs.microsoft.com/en-us/cpp/code-quality/c6278?view=msvc-160. **Proposed fix**. Use `std:unique_ptr<T[]>` instead to store temporary buffers. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#3860
Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Ben Frederickson (https://github.com/benfred) URL: rapidsai/cuvs#229
We recently discovered a memory error in the
devArrMatch()
function: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci-v0.20/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu,OS=ubuntu16.04,PYTHON=3.7/161/consolewhich was traced to the
devArrMatch()
function as follows:Diagnosis. The
devArrMatch
functions are allocating a temporary buffer usingnew[]
and then assigning it to ashared_ptr<T>
. This is not valid because the destructor ofshared_ptr<T>
will invokedelete
, notdelete[]
. Callingdelete
with a buffer allocated bynew[]
leads to an undefined behavior. See https://docs.microsoft.com/en-us/cpp/code-quality/c6278?view=msvc-160.Proposed fix. Use
std:unique_ptr<T[]>
instead to store temporary buffers.