Benchmarks for TriMesh transforms #18

nirmal-suthar · 2020-07-26T22:24:24Z

Fixes #17

Update TriMesh to compute verts only when demanded incase of invalidation, Update _compute_* to return nothing.

nirmal-suthar · 2020-07-27T01:03:41Z

avik-pal · 2020-07-27T06:07:59Z

Do you have any intuition why Kaolin's CPU benchmark is so noisy? You could try increasing the niters for benchmarking to reduce the variance (though it should not affect the min time)

nirmal-suthar · 2020-07-27T16:41:26Z

With higher niters (**Updated)

avik-pal · 2020-07-27T18:56:15Z

Hmm... Yeah it doesn't help!! Let's try and use some standard mesh. Just use an icosphere and plot the trend.

nirmal-suthar · 2020-07-27T20:45:08Z

Hmm... Yeah it doesn't help!! Let's try and use some standard mesh. Just use an icosphere and plot the trend.

Can using time module be noisy, as benchmarks looks pretty smooth in gpu, which is not using this time module?

avik-pal · 2020-07-27T20:58:04Z

It is possible, but again the pytorch people are also using a similar script

nirmal-suthar · 2020-07-27T21:27:14Z

I guess, I have figured out the problem. Cyclops CPU are already busy and while running cpu benchmark, htop shows all cpu as maxed out, which is why benchmarks are noisy [I have updated above plots]. I will run benchmarks script again once cyclops is free.

nirmal-suthar · 2020-07-30T04:07:02Z

@avik-pal I have updated TriMesh plot above, cyclops was mostly free while benchmarking.

avik-pal · 2020-07-30T05:11:21Z

~~Hmm... Interesting. So making it in place has zero effect on the performance... That doesn't seem right!~~
As per discussion on slack it turned out to be some type inference issues in std function in CUDA.jl JuliaGPU/CUDA.jl#336 (fixed). The following function can be used for the time being

stdev(a, m) = sqrt.(sum((a .- m) .^ 2, dims = 2) / size(a, 2))

Additionally, the performance has significantly improved on CUDA.jl master. We will still continue to use the latest stable release but we should still check if any of the performance issues persist on the master. If they don't we should go ahead and merge this.

And how are the metrics faring with kaolin?

nirmal-suthar · 2020-07-30T17:45:17Z

Due to some reason chamfer distance in kaolin is not working on my setup, so would try fixing it today. This loss is written in cuda native, so not able to figure out exactly.

avik-pal · 2020-07-30T17:49:15Z

Interesting... If you cant fix it let me know. I have kaolin setup on a cluster. I will install julia and run the benchmarks. (And maybe post which exact scripts I need to run)

nirmal-suthar · 2020-07-31T02:33:51Z

Kaolin benchmarks were too noisy on cyclops, So I tried benchmarking on colab (chamfer_distance cpu kaolin was very slow to benchmark)

nirmal-suthar · 2020-07-31T03:36:00Z

@avik-pal Is build failing due to julia 1.3 is not compatible with GPUCompiler?
ERROR: Unsatisfiable requirements detected for package GPUCompiler [61eb1bfa]:
Tests are passing locally on julia 1.4

avik-pal · 2020-07-31T06:42:15Z

@nirmal-suthar Could you try getting rid of the Manifest once?
It says restricted to versions 0.5.5 by an explicit requirement, leaving only versions 0.5.5 but there is no restriction on the Project.toml

avik-pal · 2020-07-31T06:46:57Z

chamfer_distance cpu kaolin was very slow to benchmark

True. I am happy that Flux3d can at least compute it on CPU. It is a very expensive operation and a general bottleneck for any 3d mesh-based work I have ever done. I would suggest putting a note that the Kaolin one is too slow to be benchmarked in a reasonable time.

We also need the timing for the forward and backward pass. So every plot in the grid should have 6 lines (forward, backward, total for both frameworks).

avik-pal · 2020-07-31T06:51:32Z

I also feel it makes sense for us to have a cuda kernel for chamfer distance given that the difference rn is somewhat big. (Maybe once check on CUDA#master before doing that)

nirmal-suthar · 2020-07-31T16:53:03Z

The above benchmarks are on CUDA#master

avik-pal · 2020-07-31T17:00:51Z

Normalize is slow on master? I checked yesterday it was faster on my setup on cyclops.

nirmal-suthar · 2020-07-31T17:01:19Z

Current issues with chamfer distance in Flux3D is that for calculating nearest neighbours distances, we are computing a matrix multiplication (of size N1 x N2 x B) which is really space expensive operation for high no. of verts. But yes, it is able to match kaolin for average number of points.

avik-pal · 2020-07-31T17:04:43Z

I have never come across 2^14 vertices in practice. Most of my problems involved like 1000 vertices. So it is good that we can match the performance of a cuda kernel without writing one ourselves. I would suggest finishing the remaining benchmarks and open an issue for this and tackle it separately.

nirmal-suthar · 2020-07-31T17:18:00Z

Normalize is slow on master? I checked yesterday it was faster on my setup on cyclops.

Benchmarks on cyclops (CUDA#master) were better than previous numbers. But I will check this again.

codecov · 2020-07-31T17:29:57Z

Codecov Report

Merging #18 into master will increase coverage by 0.45%.
The diff coverage is 84.21%.

@@            Coverage Diff             @@
##           master      #18      +/-   ##
==========================================
+ Coverage   73.85%   74.31%   +0.45%     
==========================================
  Files          21       21              
  Lines         742      763      +21     
==========================================
+ Hits          548      567      +19     
- Misses        194      196       +2

Impacted Files	Coverage Δ
src/datasets/modelnet10/mn10_pcloud.jl	`0.00% <0.00%> (ø)`
src/datasets/modelnet40/mn40_pcloud.jl	`0.00% <0.00%> (ø)`
src/metrics/pcloud.jl	`95.23% <0.00%> (-4.77%)`	⬇️
src/rep/utils.jl	`91.35% <88.23%> (-3.01%)`	⬇️
src/rep/mesh.jl	`91.16% <91.66%> (+2.16%)`	⬆️
src/metrics/mesh.jl	`100.00% <100.00%> (ø)`
src/transforms/mesh_func.jl	`88.77% <100.00%> (+0.23%)`	⬆️
src/transforms/pcloud_func.jl	`84.61% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52e9a48...8f63ba8. Read the comment docs.

nirmal-suthar · 2020-08-02T02:47:24Z

nirmal-suthar · 2020-08-02T02:51:23Z

@avik-pal I am not able to match normalize with kaolin on GPU. Can this be due to my Julia env? In my Julia env, CUDA version is 1.2.1

avik-pal · 2020-08-02T07:02:34Z

I think it might be some mistake from my end. Anyways the difference is not much

From the updated plots I feel we need to have the kernel for chamfer distance and an adjoint for that, It should allow us to beat the kaolin numbers. Even for laplacian loss we need that adjoint. Since these are for an unreasonably high number of vertices, let's not block this PR for that and handle it elsewhere.

nirmal-suthar · 2020-08-02T11:07:51Z

Posting some points before merging this PR:

CUSPARSE is currently broken for Float32 and cusparse-matrix is taking lot greater time, so using Sparse for laplacian_loss even on gpu.
Laplacian_loss in kaolin is slightly different from Flux3D, so for benchmarking in kaolin I am using a simple custom laplacian_loss which matched that of Flux3D.

avik-pal · 2020-08-02T11:54:10Z

In the benchmarks directory create a readme and mention these points.

Additionally if the cusparse issue is not known open an issue in cuda.jl

nirmal-suthar · 2020-08-02T19:56:17Z

There was an existing issue
CUDA.jl/issues/322

nirmal-suthar added 4 commits July 25, 2020 05:06

Add TriMesh transform benchmarks

ee01860

Lazy caching in TriMesh

69627da

Update TriMesh to compute verts only when demanded incase of invalidation, Update _compute_* to return nothing.

Improve NormalizeTriMesh

5826ef3

Update Benchmark script for TriMesh

cfa4a7e

nirmal-suthar requested a review from avik-pal July 26, 2020 22:26

nirmal-suthar added 4 commits July 27, 2020 07:04

Update plotting script for TriMesh transforms

1ea7bdd

Update README

5664995

Fix tests

a8e034a

Benchmarks script for metrics

f7b4fee

FluxML deleted a comment from codecov bot Jul 27, 2020

nirmal-suthar added 3 commits July 28, 2020 03:03

Update script

c346463

Add metrics benchmarks for Kaolin

2ef9cfc

Add inplace equivalent list_to_padded util

d28f67e

FluxML deleted a comment from codecov bot Jul 30, 2020

nirmal-suthar added 3 commits July 31, 2020 05:51

Update metrics benchmark script

97026e8

Add CUDA master for benchmarking

ac4ec79

Update stdev function

56ff9ce

Improve laplacian for GPU

ea73b17

Update Plots, Update plotting script

059798a

Delete Manifest.toml

cecfae9

Update metrics to include back pass

5d3e831

Update plotting script for metrics

5b689a6

nirmal-suthar force-pushed the ns/benchmarks branch from eb06618 to 5b689a6 Compare August 2, 2020 02:59

Add README, Ignore Manifest.toml

8f63ba8

nirmal-suthar merged commit 2560f1c into master Aug 2, 2020

avik-pal deleted the ns/benchmarks branch August 2, 2020 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks for TriMesh transforms #18

Benchmarks for TriMesh transforms #18

nirmal-suthar commented Jul 26, 2020 •

edited

Loading

nirmal-suthar commented Jul 27, 2020

avik-pal commented Jul 27, 2020

nirmal-suthar commented Jul 27, 2020 •

edited

Loading

avik-pal commented Jul 27, 2020

nirmal-suthar commented Jul 27, 2020 •

edited

Loading

avik-pal commented Jul 27, 2020

nirmal-suthar commented Jul 27, 2020

nirmal-suthar commented Jul 30, 2020 •

edited

Loading

avik-pal commented Jul 30, 2020 •

edited

Loading

nirmal-suthar commented Jul 30, 2020

avik-pal commented Jul 30, 2020

nirmal-suthar commented Jul 31, 2020 •

edited

Loading

nirmal-suthar commented Jul 31, 2020

avik-pal commented Jul 31, 2020

avik-pal commented Jul 31, 2020

avik-pal commented Jul 31, 2020 •

edited

Loading

nirmal-suthar commented Jul 31, 2020

avik-pal commented Jul 31, 2020

nirmal-suthar commented Jul 31, 2020

avik-pal commented Jul 31, 2020

nirmal-suthar commented Jul 31, 2020

codecov bot commented Jul 31, 2020 •

edited

Loading

nirmal-suthar commented Aug 2, 2020

nirmal-suthar commented Aug 2, 2020

avik-pal commented Aug 2, 2020

nirmal-suthar commented Aug 2, 2020

avik-pal commented Aug 2, 2020

nirmal-suthar commented Aug 2, 2020

Benchmarks for TriMesh transforms #18

Benchmarks for TriMesh transforms #18

Conversation

nirmal-suthar commented Jul 26, 2020 • edited Loading

nirmal-suthar commented Jul 27, 2020

avik-pal commented Jul 27, 2020

nirmal-suthar commented Jul 27, 2020 • edited Loading

avik-pal commented Jul 27, 2020

nirmal-suthar commented Jul 27, 2020 • edited Loading

avik-pal commented Jul 27, 2020

nirmal-suthar commented Jul 27, 2020

nirmal-suthar commented Jul 30, 2020 • edited Loading

avik-pal commented Jul 30, 2020 • edited Loading

nirmal-suthar commented Jul 30, 2020

avik-pal commented Jul 30, 2020

nirmal-suthar commented Jul 31, 2020 • edited Loading

nirmal-suthar commented Jul 31, 2020

avik-pal commented Jul 31, 2020

avik-pal commented Jul 31, 2020

avik-pal commented Jul 31, 2020 • edited Loading

nirmal-suthar commented Jul 31, 2020

avik-pal commented Jul 31, 2020

nirmal-suthar commented Jul 31, 2020

avik-pal commented Jul 31, 2020

nirmal-suthar commented Jul 31, 2020

codecov bot commented Jul 31, 2020 • edited Loading

Codecov Report

nirmal-suthar commented Aug 2, 2020

nirmal-suthar commented Aug 2, 2020

avik-pal commented Aug 2, 2020

nirmal-suthar commented Aug 2, 2020

avik-pal commented Aug 2, 2020

nirmal-suthar commented Aug 2, 2020

nirmal-suthar commented Jul 26, 2020 •

edited

Loading

nirmal-suthar commented Jul 27, 2020 •

edited

Loading

nirmal-suthar commented Jul 27, 2020 •

edited

Loading

nirmal-suthar commented Jul 30, 2020 •

edited

Loading

avik-pal commented Jul 30, 2020 •

edited

Loading

nirmal-suthar commented Jul 31, 2020 •

edited

Loading

avik-pal commented Jul 31, 2020 •

edited

Loading

codecov bot commented Jul 31, 2020 •

edited

Loading