Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NNPACK support #67

Merged
merged 62 commits into from
Apr 30, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
1aed8d9
Add NNPACK support
avik-pal Sep 19, 2018
7399ef7
Add dependencies
avik-pal Sep 19, 2018
ffc6001
simplify
avik-pal Sep 19, 2018
6cd43f6
Fix conv
avik-pal Sep 20, 2018
9d23d6e
Add tests
avik-pal Sep 20, 2018
605f83e
Minor fixes for Metalhead
avik-pal Sep 20, 2018
721b7cf
Mistype
avik-pal Sep 20, 2018
3a6a7be
Mistype
avik-pal Sep 20, 2018
8ff6931
Remove dilation kw
avik-pal Sep 20, 2018
644ab7c
Check Hardware support
avik-pal Oct 5, 2018
264e79c
Add proper dimension checks
avik-pal Oct 5, 2018
d3f0ed4
Minor Fixes
avik-pal Oct 5, 2018
b255f67
Create shared threadpool and make suggested changes
avik-pal Oct 15, 2018
af05ad7
Remove version
avik-pal Oct 15, 2018
18f978d
Merge branch 'master' of https://github.com/FluxML/NNlib.jl into NNPACK
avik-pal Oct 15, 2018
9b6b7b0
Add BinaryProvider to Project and Manifest and switch to Sys.CPU_THRE…
avik-pal Oct 15, 2018
1ef5b74
Minor Changes
avik-pal Oct 15, 2018
7dbdf06
Wrong function call fix
avik-pal Oct 15, 2018
1c96773
Remove argument to change nthreads
avik-pal Oct 29, 2018
16ede9a
Make the API consistent with master
avik-pal Oct 29, 2018
8f92d35
Default to NNPACK for Float64 as well
avik-pal Oct 29, 2018
3628b5f
Minor patches
avik-pal Oct 29, 2018
da90b86
Convert to Float64 while returning
avik-pal Oct 29, 2018
137c735
Add crosscor functions
avik-pal Oct 29, 2018
524f35e
Minor Patch
avik-pal Oct 29, 2018
fd2c602
Flip weights while returning
avik-pal Oct 29, 2018
e5e3964
Fixes for proper type
avik-pal Nov 1, 2018
2efbb74
Incorrect force push
avik-pal Nov 1, 2018
da3df3c
Remove short-form
avik-pal Nov 1, 2018
8f8081a
Remove common code
avik-pal Nov 19, 2018
cf02a05
Support maxpool fallback
avik-pal Nov 19, 2018
fba6d4a
conv fallbacks
avik-pal Nov 19, 2018
2495c1b
Tests pass
avik-pal Nov 19, 2018
69b9950
travis test on julia1.0
avik-pal Nov 22, 2018
25dcd3c
threadpool ref
MikeInnes Nov 28, 2018
8b91c9b
rm threadpool arg
MikeInnes Nov 28, 2018
ac3c880
rm float64 methods
MikeInnes Nov 28, 2018
1fa88c0
rm softmax method
MikeInnes Nov 28, 2018
efb32ce
pull out flipweight
MikeInnes Nov 28, 2018
77254d8
float64 fixup
MikeInnes Nov 28, 2018
f283696
grad filter fixup
MikeInnes Nov 28, 2018
838e88a
Remove AbstractArray
avik-pal Dec 1, 2018
08d514f
Modify libnnpack signatures
avik-pal Dec 1, 2018
ef4f22e
Typo
avik-pal Dec 1, 2018
6b50e92
Rebase PR
avik-pal Feb 20, 2019
97f11a8
Add option to disable NNPACK
avik-pal Feb 20, 2019
3c796b7
Comment out
avik-pal Feb 20, 2019
6025166
Typo fix
avik-pal Feb 20, 2019
e2947d8
Clean the code
avik-pal Feb 20, 2019
220c442
Update NNPACK interface to conform to the new NNlib
avik-pal Apr 6, 2019
43f759a
Add performance tests
avik-pal Apr 6, 2019
2591d21
Minor changes as per review
avik-pal Apr 6, 2019
8ced0c0
Lay down the structure for runtime performance check
avik-pal Apr 11, 2019
bc19012
Fix merge conflicts with master
avik-pal Apr 11, 2019
5b35148
Some heuristics to choose threadpool for pooling
avik-pal Apr 11, 2019
47560f1
Add some splatting to fix tests
staticfloat Apr 13, 2019
a30c7a7
Expose usage of NNPACK conv and maxpool operations
avik-pal Apr 13, 2019
018915c
Merge branch 'NNPACK' of github.com:avik-pal/NNlib.jl into NNPACK
avik-pal Apr 13, 2019
c4e8573
NNPACK convolution does not support stride
avik-pal Apr 13, 2019
94d1e00
Add basic heuristics
avik-pal Apr 30, 2019
e4232d7
Support builds for Mac and few other systems
avik-pal Apr 30, 2019
ee86fbb
Fix numerical errors in the tests
avik-pal Apr 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions src/NNlib.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ include("impl/depthwiseconv_im2col.jl")
# Direct implementations of pooling
include("impl/pooling_direct.jl")

to = TimerOutput()
staticfloat marked this conversation as resolved.
Show resolved Hide resolved

if Sys.islinux()
include("nnpack/NNPACK.jl")
else
Expand Down
5 changes: 4 additions & 1 deletion src/nnpack/NNPACK.jl
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,10 @@ end
try
global NNPACK_CPU_THREADS = parse(UInt64, ENV["NNPACK_CPU_THREADS"])
catch
global NNPACK_CPU_THREADS = Sys.CPU_THREADS
# Sys.CPU_THREADS should be a better default if we are tuning the benchmark suite on
# a particular machine. However, we fix the runtime threadpool here to have a max of
# 4 threads so anything above will be ignored anyways
global NNPACK_CPU_THREADS = UInt64(4)
end
allocate_threadpool()
end
24 changes: 24 additions & 0 deletions src/nnpack/interface.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,33 @@ for (front_name, backend) in (
end


function conv_nnpack(x::Array{T1, 4}, w::Array{T2, 4}, cdims::ConvDims; kwargs...) where {T1, T2}
y = similar(x, output_size(cdims), channels_out(cdims), size(x, 4))
return conv_nnpack!(y, x, w, cdims; kwargs...)
end


function ∇conv_data(dy::Array{T1, 4}, w::Array{T2, 4}, cdims::ConvDims; kwargs...) where {T1, T2}
dx = similar(dy, input_size(cdims), channels_in(cdims), size(dy, 4))
return ∇conv_data!(dx, dy, w, cdims; kwargs...)
end


function ∇conv_filter(x::Array{T1, 4}, dy::Array{T2, 4}, cdims::ConvDims; kwargs...) where {T1, T2}
dw = similar(x, kernel_size(cdims), channels_in(cdims), channels_out(cdims))
return ∇conv_filter!(dw, x, dy, cdims; kwargs...)
end


function maxpool_nnpack!(y::Array{T1, 4}, x::Array{T2, 4}, pdims::PoolDims;
kwargs...) where {T1, T2}
@warn "Automatically converting $(size(x)) input tensor to Float32" maxlog=1
# We want the output to be of the same type as desired
T1.(maxpool_nnpack!(Float32.(y), Float32.(x), pdims; kwargs...))
avik-pal marked this conversation as resolved.
Show resolved Hide resolved
end


function maxpool_nnpack(x::Array{T, 4}, pdims::PoolDims; kwargs...) where {T}
y = similar(x, output_size(pdims)..., channels_out(pdims), size(x, 4))
return maxpool_nnpack!(y, x, pdims; kwargs...)
end
10 changes: 10 additions & 0 deletions src/nnpack/performance.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,15 @@ function select_threadpool(cdims::DenseConvDims, batch_size::Int)
end

function select_threadpool(pdims::PoolDims, batch_size::Int)
inp_size = input_size(pdims)[1]
if batch_size >= 32
return shared_threadpool_dict[4][]
elseif batch_size >= 16 && inp_size >= 64
return shared_threadpool_dict[4][]
elseif inp_size >= 128
return shared_threadpool_dict[4][]
elseif inp_size * batch_size >= 256
return shared_threadpool_dict[4][]
end
return C_NULL
end