Skip to content

N dimensional conv & pool #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

N dimensional conv & pool #31

wants to merge 14 commits into from

Conversation

tejank10
Copy link
Contributor

@tejank10 tejank10 commented Mar 4, 2018

convolutions and pooling for N-dimensional image

@tejank10
Copy link
Contributor Author

tejank10 commented Mar 4, 2018

@MikeInnes Please review

@tejank10 tejank10 changed the title N dimensional conv N dimensional conv & pool Mar 8, 2018
@dfdx
Copy link
Contributor

dfdx commented Mar 10, 2018

Any benchmarks comparing it with C++ version?

Also I'm not sure about making padding a required argument, can't we just use something like pad=tuple(zeros(Int, N)...) to provide default value for N-dimensional array?

@tejank10
Copy link
Contributor Author

tejank10 commented Mar 11, 2018

That's a good idea :). I have implemented that now.

Pardon me if it's noob question, is there any standard way to benchmark? Last time when I was asked to benchmark conv2d and conv3d implementation in Julia with that of C++, I had used the existing implementation of them in NNlilb.jl (which were in C++ at that time). I guess I have to write the conv_nd in C++, is it?

@dfdx
Copy link
Contributor

dfdx commented Mar 11, 2018

I'd test ND convolution in Julia against existing 2D and 3D convolutions in C++. In fact, there should have been benchmarks for 2D/3D convolutions between Julia and C++ implementations, but I lost track of NNlib for a while and missed when it has been reimplemented in pure Julia. Previously we used benchmarks from Mocha.jl, it should be easy to adapt them for new implementations.

If C++ wins, it's not necessarily a show stopper, but we need to know how much we lose with pure-Julia and generic implementation. Also note that CUDNN - perhaps the primary provider for fast convolutions on GPU - only supports 2D and 3D convolutions, so we need to position generic ND convolutions correctly to not confuse deep learning people.

@tejank10
Copy link
Contributor Author

Thanks a lot for your input, it helped me a lot in benchmarking conv.
Here are the results:

Julia conv2d TrialEstimate(529.985 μs)
C conv2d TrialEstimate(535.165 μs)
Julia conv3d TrialEstimate(41.316 ms)
C conv3d TrialEstimate(26.435 ms)
Julia conv_nd 2dTrialEstimate(761.929 ms)
Julia conv_nd 3dTrialEstimate(51.210 s)
Conv2d Julia vs Conv2d C: TrialRatio(99.03%)
Conv3d Julia vs Conv3d C: TrialRatio(156.29%)
Conv_nd 2d Julia vs Conv2d C: TrialRatio(142372.71%)
Conv_nd 3d Julia vs Conv3d C: TrialRatio(193717.42%)

Well, the results turn out to be poor for generic conv. I guess the main culprit of this is the extra loop we have run for finding the indices.

@dfdx
Copy link
Contributor

dfdx commented Mar 14, 2018

Hm, 200x slowdown looks a bit too high. Most likely there's just a couple very slow operations that make the whole thing perform poorly. Can you profile it? It should be pretty easy to identify a bottleneck.

Also I believe GitHub still doesn't deliver notifications for inline comments, so I'll copy it here: in conv.jl on line 161 (and in all similar places) you use pad=tuple(zeros(Int, N)), which returns a tuple with a single element of type Array, e.g. ([0,0,0],). Instead, you can use pad=tuple(zeros(Int, N)...,) (note dots before the trailing comma) to unpack array into a tuple and get something like (0, 0, 0).

@tejank10
Copy link
Contributor Author

tejank10 commented Mar 17, 2018

Thanks for the profiling link, and sorry for the late reply.
I have profiled the im2col_nd!.
Here are the results. Below are some lines from the file pertaining directly to im2col_nd!:

      1    ./broadcast.jl:192; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...`
      13   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:153; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      97   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:154; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      19   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:155; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      2    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:158; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      5    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:161; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      295  .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:165; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      1    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:167; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      213  .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:168; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      7    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:169; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      20   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:171; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      55   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:173; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      4    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:176; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      2    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:177; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      10   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:178; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      10   .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:179; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      695  .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:181; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...
      3    .../tejank10/Documents/NNlib.jl/src/impl/conv.jl:182; im2col_nd!(::Array{Float64,3}, ::Array{Float64,2}, ::Tuple{Int64,Int...

On line 181, range of dim_pad is checked. That takes the highest time.
Regarding the default arguments: Thanks for pointing them out, my bad! I'll rectify that soon.

@tejank10
Copy link
Contributor Author

tejank10 commented Mar 17, 2018

Optimized the code to obtain 34x speedup over previous version.
Here is the profile result

Julia conv2d TrialEstimate(443.947 μs)
C conv2d TrialEstimate(364.420 μs)
Julia conv3d TrialEstimate(32.744 ms)
C conv3d TrialEstimate(23.445 ms)
Julia conv_nd 2dTrialEstimate(11.902 ms)
Julia conv_nd 3dTrialEstimate(1.312 s)
Conv2d Julia vs Conv2d C: TrialRatio(121.82%)
Conv3d Julia vs Conv3d C: TrialRatio(139.66%)
Conv_nd 2d Julia vs Conv2d  C: TrialRatio(3266.07%)
Conv_nd 3d Julia vs Conv3d C: TrialRatio(5595.04%)

@dfdx
Copy link
Contributor

dfdx commented Mar 20, 2018

There are quite few samples in profiling, let's run it in a loop for, say, 1000 iterations. It should also decrease time for inference since currently it looks like a major runtime cost.

Yet we are still dozens of times slower than specialized 2D and 3D convolutions, so it makes sense to dispatch to them in normal cases and use full ND convolutions for arrays with higher dimensions. @MikeInnes does it sound like a reasonable approach for NNlib/Flux?

@tejank10
Copy link
Contributor Author

Here is the profiling result.
Totally agreed with you on the second point. 👍 .

@staticfloat
Copy link
Contributor

@tejank10 Not to add more work for you here, but it would be really useful to add a dilation factor here, to skip input pixels and enable dilated (or atrous) convolution.

I'm imagining we could have a parameter called dilation that is a tuple of integers similarly to pad, such that if I had a 2d convolution and I added dilation = (1, 1) I would get something like this:

image

@jekbradbury
Copy link
Contributor

There are merge markers left in the most recent commit.

@staticfloat
Copy link
Contributor

This is going to need to be completely reworked now that #94 has landed. It shouldn't be too difficult, but may take some work. For now, I'm going to close this. Thanks for all your work @tejank10!

ToucheSir pushed a commit that referenced this pull request Feb 13, 2023
Add CUDA kernels for grid sampling
ToucheSir pushed a commit that referenced this pull request Feb 13, 2023
Add CUDA kernels for grid sampling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants