N dimensional conv & pool #31

tejank10 · 2018-03-04T07:21:29Z

convolutions and pooling for N-dimensional image

tejank10 · 2018-03-04T07:34:08Z

dfdx · 2018-03-10T21:50:13Z

Any benchmarks comparing it with C++ version?

Also I'm not sure about making padding a required argument, can't we just use something like pad=tuple(zeros(Int, N)...) to provide default value for N-dimensional array?

tejank10 · 2018-03-11T08:17:41Z

That's a good idea :). I have implemented that now.

Pardon me if it's noob question, is there any standard way to benchmark? Last time when I was asked to benchmark conv2d and conv3d implementation in Julia with that of C++, I had used the existing implementation of them in NNlilb.jl (which were in C++ at that time). I guess I have to write the conv_nd in C++, is it?

dfdx · 2018-03-11T17:55:42Z

I'd test ND convolution in Julia against existing 2D and 3D convolutions in C++. In fact, there should have been benchmarks for 2D/3D convolutions between Julia and C++ implementations, but I lost track of NNlib for a while and missed when it has been reimplemented in pure Julia. Previously we used benchmarks from Mocha.jl, it should be easy to adapt them for new implementations.

If C++ wins, it's not necessarily a show stopper, but we need to know how much we lose with pure-Julia and generic implementation. Also note that CUDNN - perhaps the primary provider for fast convolutions on GPU - only supports 2D and 3D convolutions, so we need to position generic ND convolutions correctly to not confuse deep learning people.

tejank10 · 2018-03-12T15:35:01Z

Thanks a lot for your input, it helped me a lot in benchmarking conv.
Here are the results:

Julia conv2d TrialEstimate(529.985 μs)
C conv2d TrialEstimate(535.165 μs)
Julia conv3d TrialEstimate(41.316 ms)
C conv3d TrialEstimate(26.435 ms)
Julia conv_nd 2dTrialEstimate(761.929 ms)
Julia conv_nd 3dTrialEstimate(51.210 s)
Conv2d Julia vs Conv2d C: TrialRatio(99.03%)
Conv3d Julia vs Conv3d C: TrialRatio(156.29%)
Conv_nd 2d Julia vs Conv2d C: TrialRatio(142372.71%)
Conv_nd 3d Julia vs Conv3d C: TrialRatio(193717.42%)

Well, the results turn out to be poor for generic conv. I guess the main culprit of this is the extra loop we have run for finding the indices.

dfdx · 2018-03-14T22:47:21Z

Hm, 200x slowdown looks a bit too high. Most likely there's just a couple very slow operations that make the whole thing perform poorly. Can you profile it? It should be pretty easy to identify a bottleneck.

Also I believe GitHub still doesn't deliver notifications for inline comments, so I'll copy it here: in conv.jl on line 161 (and in all similar places) you use pad=tuple(zeros(Int, N)), which returns a tuple with a single element of type Array, e.g. ([0,0,0],). Instead, you can use pad=tuple(zeros(Int, N)...,) (note dots before the trailing comma) to unpack array into a tuple and get something like (0, 0, 0).

tejank10 · 2018-03-17T08:12:18Z

Thanks for the profiling link, and sorry for the late reply.
I have profiled the im2col_nd!.
Here are the results. Below are some lines from the file pertaining directly to im2col_nd!:

      1    ./broadcast.jl:192; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...`
      13   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:153; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      97   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:154; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      19   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:155; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      2    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:158; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      5    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:161; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      295  .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:165; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      1    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:167; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      213  .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:168; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      7    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:169; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      20   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:171; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      55   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:173; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      4    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:176; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      2    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:177; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      10   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:178; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      10   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:179; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      695  .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:181; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      3    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:182; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...

On line 181, range of dim_pad is checked. That takes the highest time.
Regarding the default arguments: Thanks for pointing them out, my bad! I'll rectify that soon.

tejank10 · 2018-03-17T10:11:04Z

Optimized the code to obtain 34x speedup over previous version.
Here is the profile result

Julia conv2d TrialEstimate(443.947 μs)
C conv2d TrialEstimate(364.420 μs)
Julia conv3d TrialEstimate(32.744 ms)
C conv3d TrialEstimate(23.445 ms)
Julia conv_nd 2dTrialEstimate(11.902 ms)
Julia conv_nd 3dTrialEstimate(1.312 s)
Conv2d Julia vs Conv2d C: TrialRatio(121.82%)
Conv3d Julia vs Conv3d C: TrialRatio(139.66%)
Conv_nd 2d Julia vs Conv2d  C: TrialRatio(3266.07%)
Conv_nd 3d Julia vs Conv3d C: TrialRatio(5595.04%)

dfdx · 2018-03-20T20:27:51Z

There are quite few samples in profiling, let's run it in a loop for, say, 1000 iterations. It should also decrease time for inference since currently it looks like a major runtime cost.

Yet we are still dozens of times slower than specialized 2D and 3D convolutions, so it makes sense to dispatch to them in normal cases and use full ND convolutions for arrays with higher dimensions. @MikeInnes does it sound like a reasonable approach for NNlib/Flux?

tejank10 · 2018-03-21T19:50:42Z

Here is the profiling result.
Totally agreed with you on the second point. 👍 .

staticfloat · 2018-05-04T17:36:51Z

@tejank10 Not to add more work for you here, but it would be really useful to add a dilation factor here, to skip input pixels and enable dilated (or atrous) convolution.

I'm imagining we could have a parameter called dilation that is a tuple of integers similarly to pad, such that if I had a 2d convolution and I added dilation = (1, 1) I would get something like this:

jekbradbury · 2018-05-24T05:30:25Z

There are merge markers left in the most recent commit.

staticfloat · 2019-04-11T00:56:55Z

This is going to need to be completely reworked now that #94 has landed. It shouldn't be too difficult, but may take some work. For now, I'm going to close this. Thanks for all your work @tejank10!

Add CUDA kernels for grid sampling

tejank10 added 2 commits March 2, 2018 12:35

N dimensional conv

a02a827

N dimensional conv

49036d0

N Dimensional Maxpool

276dd91

tejank10 changed the title ~~N dimensional conv~~ N dimensional conv & pool Mar 8, 2018

N Dimensional Meanpool

af92dd9

Set default arguments for pool

374ed3e

tejank10 added 2 commits March 17, 2018 15:29

im2col col2im optimizations

514f638

Rectified default args

f945839

tejank10 added 3 commits March 17, 2018 20:49

More optimization

2ebdd5e

optimization

eb96e0f

more speedup

9abea11

minor speedup, fixed tests

5f89684

Uncommented the code

8008dd1

sync with upstream/master

c4e610d

tejank10 force-pushed the master branch from 7744c24 to c4e610d Compare June 4, 2018 18:08

resolved all conflicts

6c50102

shreyas-kowshik mentioned this pull request Mar 5, 2019

Does Flux support N-dimensional convolutions, where N>3? FluxML/Flux.jl#451

Open

staticfloat closed this Apr 11, 2019

ToucheSir pushed a commit that referenced this pull request Feb 13, 2023

Merge pull request #31 from pxl-th/master

4b932b8

Add CUDA kernels for grid sampling

ToucheSir pushed a commit that referenced this pull request Feb 13, 2023

Merge pull request #31 from pxl-th/master

9cc8a3c

Add CUDA kernels for grid sampling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

N dimensional conv & pool #31

N dimensional conv & pool #31

Uh oh!

tejank10 commented Mar 4, 2018 •

edited

Loading

Uh oh!

tejank10 commented Mar 4, 2018

Uh oh!

dfdx commented Mar 10, 2018

Uh oh!

tejank10 commented Mar 11, 2018 •

edited

Loading

Uh oh!

dfdx commented Mar 11, 2018

Uh oh!

tejank10 commented Mar 12, 2018

Uh oh!

dfdx commented Mar 14, 2018

Uh oh!

tejank10 commented Mar 17, 2018 •

edited

Loading

Uh oh!

tejank10 commented Mar 17, 2018 •

edited

Loading

Uh oh!

dfdx commented Mar 20, 2018

Uh oh!

tejank10 commented Mar 21, 2018

Uh oh!

staticfloat commented May 4, 2018

Uh oh!

jekbradbury commented May 24, 2018

Uh oh!

staticfloat commented Apr 11, 2019

Uh oh!

Uh oh!

Uh oh!

N dimensional conv & pool #31

N dimensional conv & pool #31

Uh oh!

Conversation

tejank10 commented Mar 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tejank10 commented Mar 4, 2018

Uh oh!

dfdx commented Mar 10, 2018

Uh oh!

tejank10 commented Mar 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfdx commented Mar 11, 2018

Uh oh!

tejank10 commented Mar 12, 2018

Uh oh!

dfdx commented Mar 14, 2018

Uh oh!

tejank10 commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tejank10 commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfdx commented Mar 20, 2018

Uh oh!

tejank10 commented Mar 21, 2018

Uh oh!

staticfloat commented May 4, 2018

Uh oh!

jekbradbury commented May 24, 2018

Uh oh!

staticfloat commented Apr 11, 2019

Uh oh!

Uh oh!

tejank10 commented Mar 4, 2018 •

edited

Loading

tejank10 commented Mar 11, 2018 •

edited

Loading

tejank10 commented Mar 17, 2018 •

edited

Loading

tejank10 commented Mar 17, 2018 •

edited

Loading