-
-
Notifications
You must be signed in to change notification settings - Fork 127
N dimensional conv & pool #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@MikeInnes Please review |
Any benchmarks comparing it with C++ version? Also I'm not sure about making padding a required argument, can't we just use something like |
That's a good idea :). I have implemented that now. Pardon me if it's noob question, is there any standard way to benchmark? Last time when I was asked to benchmark |
I'd test ND convolution in Julia against existing 2D and 3D convolutions in C++. In fact, there should have been benchmarks for 2D/3D convolutions between Julia and C++ implementations, but I lost track of NNlib for a while and missed when it has been reimplemented in pure Julia. Previously we used benchmarks from Mocha.jl, it should be easy to adapt them for new implementations. If C++ wins, it's not necessarily a show stopper, but we need to know how much we lose with pure-Julia and generic implementation. Also note that CUDNN - perhaps the primary provider for fast convolutions on GPU - only supports 2D and 3D convolutions, so we need to position generic ND convolutions correctly to not confuse deep learning people. |
Thanks a lot for your input, it helped me a lot in benchmarking conv.
Well, the results turn out to be poor for generic conv. I guess the main culprit of this is the extra loop we have run for finding the indices. |
Hm, 200x slowdown looks a bit too high. Most likely there's just a couple very slow operations that make the whole thing perform poorly. Can you profile it? It should be pretty easy to identify a bottleneck. Also I believe GitHub still doesn't deliver notifications for inline comments, so I'll copy it here: in |
Thanks for the profiling link, and sorry for the late reply.
On line 181, range of |
Optimized the code to obtain 34x speedup over previous version.
|
There are quite few samples in profiling, let's run it in a loop for, say, 1000 iterations. It should also decrease time for inference since currently it looks like a major runtime cost. Yet we are still dozens of times slower than specialized 2D and 3D convolutions, so it makes sense to dispatch to them in normal cases and use full ND convolutions for arrays with higher dimensions. @MikeInnes does it sound like a reasonable approach for NNlib/Flux? |
Here is the profiling result. |
@tejank10 Not to add more work for you here, but it would be really useful to add a I'm imagining we could have a parameter called |
There are merge markers left in the most recent commit. |
Add CUDA kernels for grid sampling
Add CUDA kernels for grid sampling
convolutions and pooling for N-dimensional image