-
Couldn't load subscription status.
- Fork 16
Reduce cuda memory use #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your advice. Following your instructions, we also conduct some experiments on saving memory usage by moving as many tensors as possible to the CPU device, where the decoded point map results are also moved to CPU. However, this modification leads to minor change in memory usage. We test with 576*1024 110-frame video, the memory usage is still around 40G, without obvious decline. Do you have any further suggestion on this problem? Maybe this modification only works for downsampled processing? |
It only reduces CUDA memory use when the input is longer than 110 frames and/or there is downsampling. I have only tested at --height 384 --width 640 with original input of 1080x 1920 |
|
With latest changes and using chunk size 6 instead of 8 i managed to get the model to do 1024 x 576 x 108 frames on a Nvidia 3090 |
|
Really appreciate your work on optimizing our implementation. Could you modify the corresponding part in the determ pipeline? So I can merge the PR once. By the way, remember to update to our latest version. Thanks again for your work. |
|
I've just checked your modification. Here's some advices :
The frequent device moving operations and puting some interpolation operations on cpu may influence the inference speed, I'll find some balance between inference speed and memory usage. Inspired by your modification, I'll update our implementation after merging your PR, which can make GeometryCrafter better. |
|
We've integrated this feature into the repo, thanks again to your helpful suggestions! |
This PR Reduces CUDA memory use by: