Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) in UnitTestCuVectorAddRowSumMat for large matrix dimensions #4458

Open
nayakajay opened this issue Feb 18, 2021 · 9 comments
Assignees
Labels
bug stale-exclude Stale bot ignore this issue

Comments

@nayakajay
Copy link

Tested on: Titan RTX, cuda 11.0, Driver 455.51.05
The system has abundant RAM (>100G) and is a Intel Xeon processor

I have been trying to run the tests provided in cu-matrix-test.cc. I am interested in a particular test, UnitTestCuVectorAddRowSumMat. To run only 1 particular test, I have commented all the other tests in "CudaMatrixUnitTest" function and have modified the "main" function in the test file as

int main() {
  SetVerboseLevel(1);
  int32 loop = 0;

#if HAVE_CUDA == 1
  for (loop = 1; loop < 2; loop++) {
    CuDevice::Instantiate().SetDebugStrideMode(true);
    if (loop == 0)
      CuDevice::Instantiate().SelectGpuId("no");
    else
      CuDevice::Instantiate().SelectGpuId("yes");
#endif

    kaldi::CudaMatrixUnitTest<double>();

    if (loop == 0)
      KALDI_LOG << "Tests without GPU use succeeded.";
    else
      KALDI_LOG << "Tests with GPU use (if available) succeeded.";

#if HAVE_CUDA == 1
  } // No for loop if 'HAVE_CUDA != 1',
  CuDevice::Instantiate().PrintProfile();
#endif
  return 0;
}

As can be seen, I run the test only for double. In the test "UnitTestCuVectorAddRowSumMat", I give
X=65000, Y=64360 (well within limits of int32). I am observing segmentation faults in that case. For X=45000, Y=44550, the test runs successfully. Am I doing something wrong?

Sample output

$ ./cu-matrix-test
LOG ([5.5.854~1-403d]:SelectGpuId():cu-device.cc:172) Manually selected to compute on CPU.
Segmentation fault (core dumped)

The GPU code is running fine, I think, the relevant output is (by setting loop=1 in the main shown earlier)

$ ./cu-matrix-test
WARNING ([5.5.854~1-403d]:SelectGpuId():cu-device.cc:247) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:446) Selecting from 1 GPUs
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:461) cudaSetDevice(0): TITAN RTX   free:24048M, used:172M, total:24220M, free/total:0.992899
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:509) Device: 0, mem_ratio: 0.992899
LOG ([5.5.854~1-403d]:SelectGpuId():cu-device.cc:390) Trying to select device: 0
LOG ([5.5.854~1-403d]:SelectGpuIdAuto():cu-device.cc:519) Success selecting device 0 free mem ratio: 0.992899
LOG ([5.5.854~1-403d]:FinalizeActiveGpu():cu-device.cc:346) The active GPU is [0]: TITAN RTX    free:23566M, used:654M, total:24220M, free/total:0.972998 version 7.5
Segmentation fault (core dumped)
@nayakajay nayakajay added the bug label Feb 18, 2021
@danpovey
Copy link
Contributor

danpovey commented Feb 18, 2021 via email

@nayakajay
Copy link
Author

nayakajay commented Feb 18, 2021

A solution would be, to change MatrixDimT (matrix/matrix-common.h) and MatrixDimT_cuda (cudamatrix/cu-matrixdim.h) from int32 and int32_t to int64 and int64_t?

Edited: Never mind, it can cause problems.

@danpovey
Copy link
Contributor

danpovey commented Feb 18, 2021 via email

@danpovey
Copy link
Contributor

Wait, no...
just change that one function, that is failing, to use larger size types where necessary.

@nayakajay
Copy link
Author

The only place I could think where product of x_dim and y_dim will be done is during memory allocation (also a cause of SegFault?). But it seems that is already taken care of with a static_cast. Matrix allocation.

The functions where actual operations happen use cblas_* functions.

@danpovey
Copy link
Contributor

danpovey commented Feb 19, 2021 via email

@nayakajay
Copy link
Author

nayakajay commented Feb 25, 2021

Solution:
In all the trouble causing places, I added a static_cast<size_t>. The test seem to be passing (no ASSERT failure).

Problems:
For double type, the first problem was occurring at kaldi-matrix.h. This seemed to get rid of the error. I tried to fit larger data on the GPU by changing the data type to float. On using float type; the another issue showed up kaldi-matrix.cc

The trouble also occurred in the cuda kernel cu-kernels.cu. It threw "illegal memory accessed". Used cuda_memcheck to get that information.

I wanted to know if there are any assumptions w.r.t the _strided_reduction_fused kernel, regarding the dimensions of the matrix passed to it. Can it be rectangular (number of rows <<< number of columns or vice-versa)?

I wanted to get information regarding any real-world applications using this specific kernel. Any hints or suggestions in looking for such applications will be really great.

@danpovey
Copy link
Contributor

Can you please make a pull request with the errors you fixed?
Sorry, you'll have to look of that kernel yourself, and where it's called, I didn't write it and am not familiar.
This line:
int idx = colStart + j * d.stride;
concerns me. I'm not sure what int is, it could be 32 bit and that could overflow; could cast to size_t and make sure idx is also of type size_t.

@kkm000 kkm000 self-assigned this Mar 14, 2021
@stale
Copy link

stale bot commented May 13, 2021

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

@stale stale bot added the stale Stale bot on the loose label May 13, 2021
@kkm000 kkm000 added the stale-exclude Stale bot ignore this issue label Jun 12, 2021
@stale stale bot removed the stale Stale bot on the loose label Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug stale-exclude Stale bot ignore this issue
Projects
None yet
Development

No branches or pull requests

3 participants