-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks for TriMesh transforms #18
Conversation
Update TriMesh to compute verts only when demanded incase of invalidation, Update _compute_* to return nothing.
Do you have any intuition why Kaolin's CPU benchmark is so noisy? You could try increasing the niters for benchmarking to reduce the variance (though it should not affect the min time) |
Hmm... Yeah it doesn't help!! Let's try and use some standard mesh. Just use an icosphere and plot the trend. |
Can using |
It is possible, but again the pytorch people are also using a similar script |
I guess, I have figured out the problem. Cyclops CPU are already busy and while running cpu benchmark, htop shows all cpu as maxed out, which is why benchmarks are noisy [I have updated above plots]. I will run benchmarks script again once cyclops is free. |
@avik-pal I have updated TriMesh plot above, cyclops was mostly free while benchmarking. |
stdev(a, m) = sqrt.(sum((a .- m) .^ 2, dims = 2) / size(a, 2)) Additionally, the performance has significantly improved on CUDA.jl master. We will still continue to use the latest stable release but we should still check if any of the performance issues persist on the master. If they don't we should go ahead and merge this. And how are the metrics faring with kaolin? |
Due to some reason chamfer distance in kaolin is not working on my setup, so would try fixing it today. This loss is written in cuda native, so not able to figure out exactly. |
Interesting... If you cant fix it let me know. I have kaolin setup on a cluster. I will install julia and run the benchmarks. (And maybe post which exact scripts I need to run) |
@avik-pal Is build failing due to julia 1.3 is not compatible with GPUCompiler? |
@nirmal-suthar Could you try getting rid of the Manifest once? |
True. I am happy that Flux3d can at least compute it on CPU. It is a very expensive operation and a general bottleneck for any 3d mesh-based work I have ever done. I would suggest putting a note that the Kaolin one is too slow to be benchmarked in a reasonable time. We also need the timing for the forward and backward pass. So every plot in the grid should have 6 lines (forward, backward, total for both frameworks). |
I also feel it makes sense for us to have a cuda kernel for chamfer distance given that the difference rn is somewhat big. (Maybe once check on CUDA#master before doing that) |
The above benchmarks are on CUDA#master |
Normalize is slow on master? I checked yesterday it was faster on my setup on cyclops. |
Current issues with chamfer distance in Flux3D is that for calculating nearest neighbours distances, we are computing a matrix multiplication (of size N1 x N2 x B) which is really space expensive operation for high no. of verts. But yes, it is able to match kaolin for average number of points. |
I have never come across 2^14 vertices in practice. Most of my problems involved like 1000 vertices. So it is good that we can match the performance of a cuda kernel without writing one ourselves. I would suggest finishing the remaining benchmarks and open an issue for this and tackle it separately. |
Benchmarks on cyclops (CUDA#master) were better than previous numbers. But I will check this again. |
Codecov Report
@@ Coverage Diff @@
## master #18 +/- ##
==========================================
+ Coverage 73.85% 74.31% +0.45%
==========================================
Files 21 21
Lines 742 763 +21
==========================================
+ Hits 548 567 +19
- Misses 194 196 +2
Continue to review full report at Codecov.
|
@avik-pal I am not able to match normalize with kaolin on GPU. Can this be due to my Julia env? In my Julia env, CUDA version is 1.2.1 |
eb06618
to
5b689a6
Compare
I think it might be some mistake from my end. Anyways the difference is not much From the updated plots I feel we need to have the kernel for chamfer distance and an adjoint for that, It should allow us to beat the kaolin numbers. Even for laplacian loss we need that adjoint. Since these are for an unreasonably high number of vertices, let's not block this PR for that and handle it elsewhere. |
Posting some points before merging this PR:
|
In the benchmarks directory create a readme and mention these points. Additionally if the cusparse issue is not known open an issue in cuda.jl |
There was an existing issue |
Fixes #17