Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NNPACK support #67

Merged
merged 62 commits into from
Apr 30, 2019
Merged

Add NNPACK support #67

merged 62 commits into from
Apr 30, 2019

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Sep 19, 2018

  1. Only supports Linux platforms for now.
  2. Convolutions are not functional
  3. BenchmarkTools freezes when trying to benchmark with NNPACK_CPU_THREADS > 1

Speed Improvements:

x = rand(100,100,10,100)

#NNlib maxpool
@time maxpool(x,(2,2)) # 0.32 s

x = Float32.(x)
#NNPACK maxpool with NNPACK_CPU_THREADS = 4
@time maxpool(x,(2,2)) # 0.0055 s

NOTE: Travis test fails are due to unsupported hardware

@avik-pal
Copy link
Member Author

avik-pal commented Sep 20, 2018

@MikeInnes
Convolution is fixed. But the results are for cross correlation. So #53 needs to be merged.
Also I think we should change the default to cross correlation to maintain consistency

Some notes:

  1. If maxpool or conv is called with parameters that don't add up properly, the results are quite random and might result in segfault.
  2. Need to figure out how to run travis tests for NNPACK as we get unsupported hardware

@MikeInnes
Copy link
Member

For now can we just flip the weight kernel? Should be quite cheap to do.

On (1) can we add a check that throws an error?

On (2) perhaps we can test NNlib on one of the GPU CI machines. Ideally we should not load NNPACK support if the hardware is not supported though.

@avik-pal
Copy link
Member Author

avik-pal commented Oct 4, 2018

I will add a check for that.
Is there an easy way to check the hardware, I mean similar to is_linux() in case of os?

@MikeInnes
Copy link
Member

nnpack should have such a check itself, I think. You can load the library but query that before defining the actual conv wrappers.

@avik-pal
Copy link
Member Author

avik-pal commented Oct 5, 2018

Is the check added in the previous commit fine?

@avik-pal
Copy link
Member Author

avik-pal commented Oct 5, 2018

I have flipped the weights for now so that it gives the results for convolutions

deps/build.jl Outdated Show resolved Hide resolved
src/nnpack/nnlib.jl Outdated Show resolved Hide resolved
src/nnpack/nnlib.jl Outdated Show resolved Hide resolved
src/nnpack/NNPACK.jl Outdated Show resolved Hide resolved
@avik-pal
Copy link
Member Author

avik-pal commented Oct 8, 2018

Will get these fixed soon

src/nnpack/nnlib.jl Outdated Show resolved Hide resolved
@avik-pal
Copy link
Member Author

Why can't travis get BinaryProvider even though I have listed it in the REQUIRE file?

@MikeInnes
Copy link
Member

Not sure. @staticfloat any ideas?

@avik-pal
Copy link
Member Author

So do we need to setup this dict at the time when the user is installing NNlib, or do we run the benchmarks and set these as the defaults for now. In case of the default what should I use 4 as the max threads?

src/NNlib.jl Outdated Show resolved Hide resolved
@staticfloat
Copy link
Contributor

So do we need to setup this dict at the time when the user is installing NNlib

For now, let's just hardcode it. We may eventually have a quick little micro-benchmark that takes all installed backends, compares them, and chooses the fastest at package install time, but for now I think just having something intelligent hardcoded is much better than nothing.

In case of the default what should I use 4 as the max threads?

Sure, let's start with that.

@staticfloat
Copy link
Contributor

Hmmm, we're also going to need to do something to guard against invalid stride, dilation, etc... kinds of parameters. I was just trying this out, and stride=2 causes Julia to segfault because we allocate something smaller than what NNPACK is expecting.

@avik-pal
Copy link
Member Author

we're also going to need to do something to guard against invalid stride, dilation, etc...

Yes we need to handle that. Do we put the check in conv_nnpack or in conv where we check the validity and then call conv_nnpack. Similar to the design we had prior to the overhaul PR.

@staticfloat
Copy link
Contributor

staticfloat commented Apr 13, 2019

I think we define this by using type parameters:

conv(x, w, cdims::DenseConvDims{2,(1,1),P,(1,1),F}) where {P,F} = conv_nnpack(x, w, cdims)

That will only redirect conv() from conv_im2col() -> conv_nnpack() if number of spatial dimensions (N) is 2, stride is (1,1) and dilation is (1,1).

@avik-pal
Copy link
Member Author

avik-pal commented Apr 13, 2019

It will actually work for all cases where (im_size + pad_1 + pad_2 - kernel is divisible by stride how do we handle this like u mention?

@staticfloat
Copy link
Contributor

Are you sure? Looking at the code, it doesn't look to me like NNPACK does stride at all; there's no stride parameter in the convolution routines.

@avik-pal
Copy link
Member Author

Ah yes, it doesn't support stride for convolution. But it does for maxpool.

@staticfloat
Copy link
Contributor

@avik-pal what's the status on this? Is this ready for general testing? I think I'd like to release v0.6.0 of NNlib first, then merge this in right away afterwards, for those of us living on the bleeding edge to try out and have fun with. :)

@avik-pal
Copy link
Member Author

The heuristics part of choosing the operation is not yet implemented. I will have to try to get it done by next week probably. Apart from this, it should be ok for general testing.

@avik-pal
Copy link
Member Author

avik-pal commented Apr 30, 2019

I have added some basic heuristics. They are not ideal but is much faster than what we currently have in NNlib. Also I am bounding the max threads to be 8 irrespective of what number above 8 the user enters. Anything above 8 generally worsens the performance.

I have added the new build file. But I do not have access to those systems. So someone would have to verify if this PR works there.

I am working on fixing the tests they seem to be numerical issue.

(Also it would be good to have the CI test NNPACK but Travis is quite unreliable and most of the cases we end up with a system of unsupported hardware).

@avik-pal
Copy link
Member Author

Indeed the test fails are due to the numerical accuracies. Should I change the equalities to isapprox?

@avik-pal avik-pal changed the title [WIP] Add NNPACK support Add NNPACK support Apr 30, 2019
@staticfloat
Copy link
Contributor

Should I change the equalities to isapprox?

Yes, please do, I will test on a couple of systems over here once you do.

@avik-pal
Copy link
Member Author

@staticfloat tests are passing locally for me.

@staticfloat
Copy link
Contributor

It's working fine on my Macbook Pro and my Linux server. This is great work Avik, I think we're ready to merge!

@staticfloat staticfloat merged commit ec79173 into FluxML:master Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants