Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question for render feature? #7

Closed
SYSUykLin opened this issue Mar 23, 2024 · 4 comments
Closed

Question for render feature? #7

SYSUykLin opened this issue Mar 23, 2024 · 4 comments

Comments

@SYSUykLin
Copy link

Hello:
I previously attempted to render features with 256 dimensions, but CUDA indicated insufficient shared memory, allowing for a maximum of only 40 dimensions to be rendered. May I ask what changes you made to enable it to render 256 dimensions?

@41xu
Copy link

41xu commented Apr 3, 2024

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here:

cudaMalloc((void**)&collected_semantic_feature, NUM_SEMANTIC_CHANNELS * BLOCK_SIZE * sizeof(float));

If I misunderstand, please point me out.

@JrMeng0312
Copy link

graphdeco-inria/gaussian-splatting#41 (comment) you can try this: adding "-Xcompiler -fno-gnu-unique" option in submodules/diff-gaussian-rasterization/setup.py: line 29 resolves the illegal memory access error in training.

extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]})

@SYSUykLin
Copy link
Author

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here:

cudaMalloc((void**)&collected_semantic_feature, NUM_SEMANTIC_CHANNELS * BLOCK_SIZE * sizeof(float));

If I misunderstand, please point me out.

Thanks very very very much.

@SYSUykLin
Copy link
Author

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here:

cudaMalloc((void**)&collected_semantic_feature, NUM_SEMANTIC_CHANNELS * BLOCK_SIZE * sizeof(float));

If I misunderstand, please point me out.

Thanks very very very much.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants