-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NNPACK support #67
Conversation
@MikeInnes Some notes:
|
For now can we just flip the weight kernel? Should be quite cheap to do. On (1) can we add a check that throws an error? On (2) perhaps we can test NNlib on one of the GPU CI machines. Ideally we should not load NNPACK support if the hardware is not supported though. |
I will add a check for that. |
nnpack should have such a check itself, I think. You can load the library but query that before defining the actual |
Is the check added in the previous commit fine? |
I have flipped the weights for now so that it gives the results for convolutions |
Will get these fixed soon |
Why can't travis get |
Not sure. @staticfloat any ideas? |
So do we need to setup this dict at the time when the user is installing NNlib, or do we run the benchmarks and set these as the defaults for now. In case of the default what should I use 4 as the max threads? |
For now, let's just hardcode it. We may eventually have a quick little micro-benchmark that takes all installed backends, compares them, and chooses the fastest at package install time, but for now I think just having something intelligent hardcoded is much better than nothing.
Sure, let's start with that. |
Hmmm, we're also going to need to do something to guard against invalid |
Yes we need to handle that. Do we put the check in |
I think we define this by using type parameters:
That will only redirect |
|
Are you sure? Looking at the code, it doesn't look to me like NNPACK does stride at all; there's no stride parameter in the convolution routines. |
Ah yes, it doesn't support stride for convolution. But it does for maxpool. |
@avik-pal what's the status on this? Is this ready for general testing? I think I'd like to release |
The heuristics part of choosing the operation is not yet implemented. I will have to try to get it done by next week probably. Apart from this, it should be ok for general testing. |
I have added some basic heuristics. They are not ideal but is much faster than what we currently have in NNlib. Also I am bounding the max threads to be 8 irrespective of what number above 8 the user enters. Anything above 8 generally worsens the performance. I have added the new build file. But I do not have access to those systems. So someone would have to verify if this PR works there. I am working on fixing the tests they seem to be numerical issue. (Also it would be good to have the CI test NNPACK but Travis is quite unreliable and most of the cases we end up with a system of unsupported hardware). |
Indeed the test fails are due to the numerical accuracies. Should I change the equalities to |
Yes, please do, I will test on a couple of systems over here once you do. |
@staticfloat tests are passing locally for me. |
It's working fine on my Macbook Pro and my Linux server. This is great work Avik, I think we're ready to merge! |
Convolutions are not functionalBenchmarkTools freezes when trying to benchmark withNNPACK_CPU_THREADS
> 1Speed Improvements:
NOTE: Travis test fails are due to unsupported hardware