Skip to content

Conversation

fmassa
Copy link
Contributor

@fmassa fmassa commented Oct 11, 2015

It's almost 40x faster on CPU, and doesn't use extra memory. Should
be faster on GPU as well, but I haven't benchmarked.

  • Instead of creating a temporary matrix b1*b2 of size (n,d,d), rearrange the computations to avoid creating this huge matrix which was eating all the memory.
  • forward and backward supports varying dimensionality (which wasn't the case before because of the eye matrix)

Should be of interest to @ffmpbgrnn and @bamos.

It's almost 40x faster on CPU, and doesn't use extra memory. Should
be faster on GPU as well
@fmassa
Copy link
Contributor Author

fmassa commented Oct 13, 2015

Just did a quick test on the GPU, backward seems to be much faster than the previous version as well.
For an input of dimensions (32,4096), this version is 100x faster than the old one (and the old version used 2GB of temporary buffers).

@bamos
Copy link
Contributor

bamos commented Oct 13, 2015

Hi, I ran @ffmpbgrnn's test from #341 for some simple profiling of this PR on a Tesla K40 GPU and 3.70GHz CPU. The speedup is amazing!

require 'nn'
require 'cutorch'
require 'cunn'

local module = nn.Normalize(2):cuda()
module:fastMode(false)
local input = torch.rand(64, 2400):cuda()
local t = torch.Timer()
for i = 1, 100 do
    module:forward(input)
    module:backward(input, input)
    print(i)
end
print(t:time().real/100)

Current Master

  • Commit: b80bda2
  • md5sum of Normalize.lua: 4b1f217a796f00ff8cfe575da8e4a409
  • CPU mean execution time: 3.82 s
  • K40 mean execution time: 0.108 s

This PR

  • md5sum of Normalize.lua: a9c7d3f642c08de361369be201234fca
  • CPU: 0.00634 s
  • K40: 0.000624 s

@soumith
Copy link
Member

soumith commented Oct 13, 2015

this is super awesome. If unit tests pass, and it's exactly the same implementation as before, why not!!!

soumith added a commit that referenced this pull request Oct 13, 2015
Speedup and reduce memory usage in Normalize
@soumith soumith merged commit 0c6c2f4 into torch:master Oct 13, 2015
bamos added a commit to cmusatyalab/openface that referenced this pull request Oct 13, 2015
@fmassa fmassa deleted the normalize_opt branch October 13, 2015 20:37
SuperAI520 added a commit to SuperAI520/Open-Face-Recognition that referenced this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants