-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault (core dumped) in UnitTestCuVectorAddRowSumMat for large matrix dimensions #4458
Comments
LIkely the problem is that the product of the x dim and y dim is outside of
int32. The CPU code might need to be modified to handle that. I would
merge a PR if you could make one.
…On Thu, Feb 18, 2021 at 4:03 PM Ajay Nayak ***@***.***> wrote:
Tested on: Titan RTX, cuda 11.0, Driver 455.51.05
The system has abundant RAM (>100G) and is a Intel Xeon processor
I have been trying to run the tests provided in cu-matrix-test.cc. I am
interested in a particular test, UnitTestCuVectorAddRowSumMat. To run only
1 particular test, I have commented all the other tests in
"CudaMatrixUnitTest" function and have modified the "main" function in the
test file as
int main() {
SetVerboseLevel(1);
int32 loop = 0;
#if HAVE_CUDA == 1
for (loop = 1; loop < 2; loop++) {
CuDevice::Instantiate().SetDebugStrideMode(true);
if (loop == 0)
CuDevice::Instantiate().SelectGpuId("no");
else
CuDevice::Instantiate().SelectGpuId("yes");
#endif
kaldi::CudaMatrixUnitTest<double>();
if (loop == 0)
KALDI_LOG << "Tests without GPU use succeeded.";
else
KALDI_LOG << "Tests with GPU use (if available) succeeded.";
#if HAVE_CUDA == 1
} // No for loop if 'HAVE_CUDA != 1',
CuDevice::Instantiate().PrintProfile();
#endif
return 0;
}
As can be seen, I run the test only for double. In the test
"UnitTestCuVectorAddRowSumMat", I give
X=65000, Y=64360 (well within limits of int32). I am observing
segmentation faults in that case. For X=45000, Y=44550, the test runs
successfully. Am I doing something wrong?
Sample output
$ ./cu-matrix-test
LOG ([5.5.854~1-403d]:SelectGpuId():cu-device.cc:172) Manually selected to compute on CPU.
Segmentation fault (core dumped)
The GPU code is running fine, I think, the relevant output is (by setting
loop=1 in the main shown earlier)
$ ./cu-matrix-test
WARNING ([5.5.854~1-403d]:SelectGpuId():cu-device.cc:247) Not in compute-exclusive mode. Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:446) Selecting from 1 GPUs
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:461) cudaSetDevice(0): TITAN RTX free:24048M, used:172M, total:24220M, free/total:0.992899
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:509) Device: 0, mem_ratio: 0.992899
LOG ([5.5.854~1-403d]:SelectGpuId():cu-device.cc:390) Trying to select device: 0
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:519) Success selecting device 0 free mem ratio: 0.992899
LOG ([5.5.854~1-403d]:FinalizeActiveGpu():cu-device.cc:346) The active GPU is [0]: TITAN RTX free:23566M, used:654M, total:24220M, free/total:0.972998 version 7.5
Segmentation fault (core dumped)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4458>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO35LLMJHNGHVZPGS5TS7TCWXANCNFSM4XZ2GVZQ>
.
|
A solution would be, to change MatrixDimT (matrix/matrix-common.h) and MatrixDimT_cuda (cudamatrix/cu-matrixdim.h) from int32 and int32_t to int64 and int64_t? Edited: Never mind, it can cause problems. |
Yes
…On Thu, Feb 18, 2021 at 6:01 PM Ajay Nayak ***@***.***> wrote:
A solution would be, to change MatrixDimT (matrix/matrix-common.h) and
MatrixDimT_cuda (cudamatrix/cu-matrixdim.h) from int32 and int32_t to int64
and int64_t?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4458 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4GYHPX4VASGRESLJ3S7TQPNANCNFSM4XZ2GVZQ>
.
|
Wait, no... |
The only place I could think where product of x_dim and y_dim will be done is during memory allocation (also a cause of SegFault?). But it seems that is already taken care of with a static_cast. Matrix allocation. The functions where actual operations happen use cblas_* functions. |
Run it in gdb and get a stack:
gdb matrix-lib-test
(gdb) r
.. crash..
(gdb) bt
…On Fri, Feb 19, 2021 at 2:10 PM Ajay Nayak ***@***.***> wrote:
The only place I could think where product of x_dim and y_dim will be done
is during memory allocation (also a cause of SegFault?). But it seems that
is already taken care of with a static_cast. Matrix allocation
<https://github.com/kaldi-asr/kaldi/blob/master/src/matrix/kaldi-matrix.cc#L804>
.
The functions where actual operations happen use cblas_* functions.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4458 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO33T3YEB6KJLMVVSZLS7X6EFANCNFSM4XZ2GVZQ>
.
|
Solution: Problems: The trouble also occurred in the cuda kernel cu-kernels.cu. It threw "illegal memory accessed". Used cuda_memcheck to get that information. I wanted to know if there are any assumptions w.r.t the I wanted to get information regarding any real-world applications using this specific kernel. Any hints or suggestions in looking for such applications will be really great. |
Can you please make a pull request with the errors you fixed? |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
Tested on: Titan RTX, cuda 11.0, Driver 455.51.05
The system has abundant RAM (>100G) and is a Intel Xeon processor
I have been trying to run the tests provided in cu-matrix-test.cc. I am interested in a particular test, UnitTestCuVectorAddRowSumMat. To run only 1 particular test, I have commented all the other tests in "CudaMatrixUnitTest" function and have modified the "main" function in the test file as
As can be seen, I run the test only for
double
. In the test "UnitTestCuVectorAddRowSumMat", I giveX=65000, Y=64360 (well within limits of int32). I am observing segmentation faults in that case. For X=45000, Y=44550, the test runs successfully. Am I doing something wrong?
Sample output
The GPU code is running fine, I think, the relevant output is (by setting loop=1 in the main shown earlier)
The text was updated successfully, but these errors were encountered: