Add NNPACK support #67

avik-pal · 2018-09-19T14:56:38Z

Only supports Linux platforms for now.
~~Convolutions are not functional~~
~~BenchmarkTools freezes when trying to benchmark with NNPACK_CPU_THREADS > 1~~

Speed Improvements:

x = rand(100,100,10,100)

#NNlib maxpool
@time maxpool(x,(2,2)) # 0.32 s

x = Float32.(x)
#NNPACK maxpool with NNPACK_CPU_THREADS = 4
@time maxpool(x,(2,2)) # 0.0055 s

~~NOTE: Travis test fails are due to unsupported hardware~~

avik-pal · 2018-09-20T05:53:25Z

@MikeInnes
Convolution is fixed. But the results are for cross correlation. So #53 needs to be merged.
Also I think we should change the default to cross correlation to maintain consistency

Some notes:

If maxpool or conv is called with parameters that don't add up properly, the results are quite random and might result in segfault.
Need to figure out how to run travis tests for NNPACK as we get unsupported hardware

MikeInnes · 2018-10-04T13:39:43Z

For now can we just flip the weight kernel? Should be quite cheap to do.

On (1) can we add a check that throws an error?

On (2) perhaps we can test NNlib on one of the GPU CI machines. Ideally we should not load NNPACK support if the hardware is not supported though.

avik-pal · 2018-10-04T15:44:51Z

I will add a check for that.
Is there an easy way to check the hardware, I mean similar to is_linux() in case of os?

MikeInnes · 2018-10-05T10:55:24Z

nnpack should have such a check itself, I think. You can load the library but query that before defining the actual conv wrappers.

avik-pal · 2018-10-05T17:04:35Z

Is the check added in the previous commit fine?

avik-pal · 2018-10-05T17:52:59Z

I have flipped the weights for now so that it gives the results for convolutions

deps/build.jl

src/nnpack/libnnpack.jl

src/nnpack/nnlib.jl

src/nnpack/NNPACK.jl

avik-pal · 2018-10-08T15:15:17Z

Will get these fixed soon

src/nnpack/nnlib.jl

avik-pal · 2018-10-15T08:33:32Z

Why can't travis get BinaryProvider even though I have listed it in the REQUIRE file?

MikeInnes · 2018-10-15T11:32:14Z

Not sure. @staticfloat any ideas?

avik-pal · 2019-04-11T09:38:28Z

So do we need to setup this dict at the time when the user is installing NNlib, or do we run the benchmarks and set these as the defaults for now. In case of the default what should I use 4 as the max threads?

src/NNlib.jl

staticfloat · 2019-04-12T21:44:17Z

So do we need to setup this dict at the time when the user is installing NNlib

For now, let's just hardcode it. We may eventually have a quick little micro-benchmark that takes all installed backends, compares them, and chooses the fastest at package install time, but for now I think just having something intelligent hardcoded is much better than nothing.

In case of the default what should I use 4 as the max threads?

Sure, let's start with that.

staticfloat · 2019-04-13T06:32:18Z

Hmmm, we're also going to need to do something to guard against invalid stride, dilation, etc... kinds of parameters. I was just trying this out, and stride=2 causes Julia to segfault because we allocate something smaller than what NNPACK is expecting.

avik-pal · 2019-04-13T06:46:18Z

we're also going to need to do something to guard against invalid stride, dilation, etc...

Yes we need to handle that. Do we put the check in conv_nnpack or in conv where we check the validity and then call conv_nnpack. Similar to the design we had prior to the overhaul PR.

staticfloat · 2019-04-13T06:51:23Z

I think we define this by using type parameters:

conv(x, w, cdims::DenseConvDims{2,(1,1),P,(1,1),F}) where {P,F} = conv_nnpack(x, w, cdims)

That will only redirect conv() from conv_im2col() -> conv_nnpack() if number of spatial dimensions (N) is 2, stride is (1,1) and dilation is (1,1).

avik-pal · 2019-04-13T07:43:55Z

~~It will actually work for all cases where (im_size + pad_1 + pad_2 - kernel is divisible by stride how do we handle this like u mention?~~

staticfloat · 2019-04-13T07:45:11Z

Are you sure? Looking at the code, it doesn't look to me like NNPACK does stride at all; there's no stride parameter in the convolution routines.

avik-pal · 2019-04-13T08:05:17Z

Ah yes, it doesn't support stride for convolution. But it does for maxpool.

staticfloat · 2019-04-23T23:33:25Z

@avik-pal what's the status on this? Is this ready for general testing? I think I'd like to release v0.6.0 of NNlib first, then merge this in right away afterwards, for those of us living on the bleeding edge to try out and have fun with. :)

avik-pal · 2019-04-24T03:47:30Z

The heuristics part of choosing the operation is not yet implemented. I will have to try to get it done by next week probably. Apart from this, it should be ok for general testing.

avik-pal · 2019-04-30T15:32:57Z

I have added some basic heuristics. They are not ideal but is much faster than what we currently have in NNlib. Also I am bounding the max threads to be 8 irrespective of what number above 8 the user enters. Anything above 8 generally worsens the performance.

I have added the new build file. But I do not have access to those systems. So someone would have to verify if this PR works there.

I am working on fixing the tests they seem to be numerical issue.

(Also it would be good to have the CI test NNPACK but Travis is quite unreliable and most of the cases we end up with a system of unsupported hardware).

avik-pal · 2019-04-30T15:48:18Z

Indeed the test fails are due to the numerical accuracies. Should I change the equalities to isapprox?

staticfloat · 2019-04-30T15:55:03Z

Should I change the equalities to isapprox?

Yes, please do, I will test on a couple of systems over here once you do.

avik-pal · 2019-04-30T19:07:52Z

@staticfloat tests are passing locally for me.

staticfloat · 2019-04-30T21:14:31Z

It's working fine on my Macbook Pro and my Linux server. This is great work Avik, I think we're ready to merge!

Avik Pal added 4 commits September 19, 2018 20:25

Add NNPACK support

1aed8d9

Add dependencies

7399ef7

simplify

ffc6001

Fix conv

6cd43f6

Avik Pal added 5 commits September 20, 2018 11:45

Add tests

9d23d6e

Minor fixes for Metalhead

605f83e

Mistype

721b7cf

Mistype

3a6a7be

Remove dilation kw

8ff6931

MikeInnes mentioned this pull request Oct 4, 2018

Reflect mode parameter from NNlib #53

Closed

Check Hardware support

644ab7c

Avik Pal added 2 commits October 5, 2018 22:53

Add proper dimension checks

264e79c

Minor Fixes

d3f0ed4