Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't determine number of cores. Unknown SM version 5.0! #880

Closed
zhangxaochen opened this issue Sep 1, 2014 · 16 comments
Closed

Can't determine number of cores. Unknown SM version 5.0! #880

zhangxaochen opened this issue Sep 1, 2014 · 16 comments

Comments

@zhangxaochen
Copy link
Contributor

I'm trying to build&run the project pcl_kinfu_app, building is OK, while running gets me the following error:

E:\AC++\l\p-m-6d0343d1b7\_b\bin> pcl_kinfu_app_debug.exe
[pcl::gpu::printShortCudaDeviceInfo] : Device 0:  "GeForce GTX 750 Ti"  2048Mb
Can't determine number of cores. Unknown SM version 5.0!
, sm_50, 0 cores, Driver/Runtime ver.6.50/6.50
Error: invalid device function  E:/ABOUT C++/libs/pcl-master-6d0343d1b7/gpu/kinfu/src
/cuda/tsdf_volume.cu:76

I googled but not found useful posts to solve this issue. Is that caused by my CUDA installation? Or because of my GTX 750 Ti architecture being MAXWELL?

I've found the code here: https://github.com/PointCloudLibrary/pcl/blob/master/gpu/containers/src/initialization.cpp

inline int convertSMVer2Cores(int major, int minor)
{
    // Defines for GPU Architecture types (using the SM version to determine the # of cores per SM
    typedef struct {
        int SM; // 0xMm (hexidecimal notation), M = SM Major version, and m = SM minor version
        int Cores;
    } SMtoCores;

    SMtoCores gpuArchCoresPerSM[] =  { { 0x10,  8 }, { 0x11,  8 }, { 0x12,  8 }, { 0x13,  8 }, { 0x20, 32 }, { 0x21, 48 }, {0x30, 192}, {0x35, 192}, { -1, -1 }  };

    int index = 0;
    while (gpuArchCoresPerSM[index].SM != -1) 
    {
        if (gpuArchCoresPerSM[index].SM == ((major << 4) + minor) ) 
            return gpuArchCoresPerSM[index].Cores;
        index++;
    }
    printf("\nCan't determine number of cores. Unknown SM version %d.%d!\n", major, minor);
    return 0;
}

Does that mean I should reinstall some other versions of CUDA?

@VictorLamoine
Copy link
Contributor

Hello,

Here is a link description architectures / CUDA:
http://docs.nvidia.com/cuda/maxwell-compatibility-guide/index.html

The code was probably written to handle sm_30 architectures and not above.
It is not the version of CUDA but the architecture of your GPU that is a "problem" here.

You need to tweak:
if (gpuArchCoresPerSM[index].SM == ((major << 4) + minor) )

If you find a solution to this problem it would be nice to share it with by sending a pull request

Bye

@zhangxaochen
Copy link
Contributor Author

@VictorLamoine I find my GTX 750TI has 5 SMs, each with 128 cores, so I added {0x50, 128} ahead of {-1, -1}, then the error Unknown SM version 5.0 is gone, yet the invalid device function is still there:

E:\AC++\l\p-m-6d0343d1b7\_b\bin> pcl_kinfu_app_debug.exe
[pcl::gpu::printShortCudaDeviceInfo] : Device 0:  "GeForce GTX 750 Ti"  2048Mb, sm_50
, 640 cores, Driver/Runtime ver.6.50/6.50
Error: invalid device function  E:/ABOUT C++/libs/pcl-master-6d0343d1b7/gpu/kinfu/src
/cuda/tsdf_volume.cu:76

@SteveSmithStyku
Copy link

Has there been any progress on this issue? I'm running into the same "invalid device function" issue in tsdf_volume.cu:76

I was previously able to compile in Visual Studio 2010 with CUDA 5.0 using the following cmake options:

CUDA_ARCH_BIN - 2.0 2.1(2.0) 3.0
CUDA_ARCH_PTX - 3.0

This allowed me to run kinfu on both Keppler and Maxwel GPUs.

However this is no longer working for me under Visual Studio 2013 and CUDA 6.5, and I get the run-time error "invalid device function", but only when attempting to run on a Maxwell GPU. It works fine on Keppler.

If I find out anything more I'll write it here. Thanks.

@tshimba
Copy link

tshimba commented Oct 24, 2014

Hi,

Do you have any progresses about this problem?
I got the same problem with you.

However if I build on Debug mode, the program works correctly, but I got the same error message 'Error: invalid device function' when I run the program built on Release mode.

It's looks strange behavior. I'm looking for the difference between release and debug on some settings (ex. cmake list), but still didn't find it.

I'm using
Windows 8.1
Visual Studio 2013
CUDA 6.5 with NVIDIA Quadro 4000
VTK 5.10
Boost 1.56.0
Eigen 3.2.2
FLANN 1.8.4
QHull 2012.1 for Windows

Thanks,

@MichaelKorn
Copy link
Contributor

I get it working with Ubuntu 14.04 and a GTX 970:

  • set CUDA_ARCH_BIN = 5.2
  • set CUDA_ARCH_PTX = 5.2
  • add {0x52, 128} in convertSMVer2Cores
  • version 5.2 uses up to 255 registers per Thread, because of this I had to reduce the number of Threads in pcl::device::initVolume (dim3 block (16, 16);), otherwise I got "Error: too many resources requested for launch"

@ddetone
Copy link

ddetone commented Nov 30, 2014

I got KinFu working following MichaelKorn's steps with Ubuntu 14.04, GTX 980, CUDA 6.5 (using the special drivers for GTX9xx).

@Grandgarfield
Copy link

Hi,

If it may help, after searching a long time I managed to run the kinfu application with CUDA 6.5.
My OS is win7, and i'm using a laptop computer with nvidia 640M as GPU.
I had the same error in release mode and the application working fine in debug mode as f2um2326 described.
The problem seems to come from the /GL flag during compilation.
To make it work after generating the project files with cmake i manually removed the /GL flags in all .cmake files in : $(BUILD_DIR)\gpu\kinfu\CMakeFiles\cuda_compile.dir\src\cuda, there is a line setting CMAKE_HOST_FLAGS_RELEASE that is where i removed the flag.
This is no optimal solution since code is not properly optimized i guess but it may help some of you...

@QinZiwen
Copy link

I have gtx1070, cuda8.0, and have the same problem:

Device 0:  "GeForce GTX 1070"  8106Mb
Can't determine number of cores. Unknown SM version 6.1!
, sm_61, 0 cores, Driver/Runtime ver.8.0/8.0

@SergioRAgostinho
Copy link
Member

Which version of PCL are you using exactly? The current master should have no issue with this.

@haueck
Copy link
Contributor

haueck commented Jul 5, 2017

Hi,

I have similar problem when I use pcl-1.8.1rc1 release and NVIDIA Tegra X1. When I try to estimate normals I get an error: Error: invalid device function pcl-pcl-1.8.0/gpu/octree/src/cuda/octree_host.cu:64.

Build command: cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_GPU=true -DBUILD_gpu_surface=true -DBUILD_gpu_kinfu=false -DBUILD_gpu_kinfu_large_scale=false -DCUDA_ARCH_BIN="5.3" -DCUDA_ARCH_PTX="5.3" && make

pcl::gpu::printCudaDeviceInfo() output:

*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 1

Can't determine number of cores. Unknown SM version 5.3!

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3995 MBytes (4188778496 bytes)
  ( 2) Multiprocessors x ( 0) CUDA Cores/MP:     0 CUDA Cores
  GPU Clock Speed:                               0.07 GHz
  Memory Clock rate:                             0.00 Mhz
  Memory Bus Width:                              0-bit
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1

Thanks.

@taketwo
Copy link
Member

taketwo commented Jul 5, 2017

#1824 added number of CUDA cores per SM for Pascal GPUs. However, I don't see an entry for "5.3", could this be a problem?

@SergioRAgostinho
Copy link
Member

SergioRAgostinho commented Jul 5, 2017

It's exactly that.

Edit: pcl_find_cuda.cmake also needs to be updated accordingly.

@haueck
Copy link
Contributor

haueck commented Jul 5, 2017

I will test it and I can create pull request for this change.

@SergioRAgostinho
Copy link
Member

Thanks

According to http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-5-x
the number of cores for compute capability 5.x should be 128 as well.

Also according to this https://en.wikipedia.org/wiki/CUDA#GPUs_supported (not really official I know), our compute capabilities per cuda toolkit version are not exactly right
https://github.com/PointCloudLibrary/pcl/blob/master/cmake/pcl_find_cuda.cmake#L46-L58

@haueck
Copy link
Contributor

haueck commented Jul 6, 2017

I am not sure if there is a correlation between capability version and number of cores:

Cards with capability version 5.0:
GeForce GTX 750 Ti - 640 CUDA Cores
GeForce GTX 750 - 512 CUDA Cores

Cards with capability version 5.2:
GeForce GTX 980 Ti - 2816 CUDA Cores
GeForce GTX 980 - 2048 CUDA Cores
GeForce GTX 970 - 1664 CUDA Cores

Cards with capability version 5.3:
Tegra X1 - 256 CUDA Cores

@SergioRAgostinho
Copy link
Member

It's cuda cores per multiprocessor. Which means the total number of cores in the card will always be a multiple of that number. For the compute capability 5.x , each multiprocessor has 128. All the cards you listed have a humber of cores which is a multiple of this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests