-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create performance tips docs section #615
Changes from all commits
7909f93
9aae6ae
3786b3c
6043599
8b49e81
2968481
00d70de
4d2b386
2a96958
ef428a5
228e222
f69ed62
04f4c37
1814999
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,76 @@ | ||||||
# Performance Tips | ||||||
|
||||||
All the usual [Julia performance tips apply](https://docs.julialang.org/en/v1/manual/performance-tips/). | ||||||
As always [profiling your code](https://docs.julialang.org/en/v1/manual/profile/#Profiling-1) is generally a useful way of finding bottlenecks. | ||||||
Below follow some Flux specific tips/reminders. | ||||||
|
||||||
## Don't use more precision than you need. | ||||||
|
||||||
Flux works great with all kinds of number types. | ||||||
But often you do not need to be working with say `Float64` (let alone `BigFloat`). | ||||||
Switching to `Float32` can give you a significant speed up, | ||||||
not because the operations are faster, but because the memory usage is halved. | ||||||
Which means allocations occur much faster. | ||||||
And you use less memory. | ||||||
|
||||||
|
||||||
## Make sure your custom activation functions preserve the type of their inputs | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add empty line |
||||||
Not only should your activation functions be [type-stable](https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions-1), | ||||||
they should also preserve the type of their inputs. | ||||||
|
||||||
A very artificial example using an activatioon function like | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "activatioon" |
||||||
|
||||||
``` | ||||||
my_tanh(x) = Float64(tanh(x)) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems a bit artificial, perhaps something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted something very obvious, for the first example, the one below is less obvious. |
||||||
``` | ||||||
|
||||||
will result in performance on `Float32` input orders of magnitude slower than the normal `tanh` would, | ||||||
because it results in having to use slow mixed type multiplication in the dense layers. | ||||||
|
||||||
Which means if you change your data say from `Float64` to `Float32` (which should give a speedup: see above), | ||||||
you will see a large slow-down | ||||||
|
||||||
This can occur sneakily, because you can cause type-promotion by interacting with a numeric literals. | ||||||
E.g. the following will have run into the same problem as above: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
``` | ||||||
leaky_tanh(x) = 0.01x + tanh(x) | ||||||
``` | ||||||
|
||||||
While one could change your activation function (e.g. to use `0.01f0x`) to avoid this when ever your inputs change, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
the idiomatic (and safe way) is to use `oftype`. | ||||||
|
||||||
``` | ||||||
leaky_tanh(x) = oftype(x/1, 0.01) + tanh(x) | ||||||
``` | ||||||
|
||||||
|
||||||
## Evaluate batches as Matrices of features, rather than sequences of Vector features | ||||||
|
||||||
While it can sometimes be tempting to process your observations (feature vectors) one at a time | ||||||
e.g. | ||||||
```julia | ||||||
function loss_total(xs::AbstractVector{<:Vector}, ys::AbstractVector{<:Vector}) | ||||||
sum(zip(xs, ys)) do (x, y_target) | ||||||
y_pred = model(x) # evaluate the model | ||||||
return loss(y_pred, y_target) | ||||||
end | ||||||
end | ||||||
``` | ||||||
|
||||||
It is much faster to concatenate them into a matrix, | ||||||
as this will hit BLAS matrix-matrix multiplication, which is much faster than the equivalent sequence of matrix-vector multiplications. | ||||||
Even though this means allocating new memory to store them contiguously. | ||||||
|
||||||
```julia | ||||||
x_batch = reduce(hcat, xs) | ||||||
y_batch = reduce(hcat, ys) | ||||||
... | ||||||
function loss_total(x_batch::Matrix, y_batch::Matrix) | ||||||
y_preds = model(x_batch) | ||||||
sum(loss.(y_preds, y_batch)) | ||||||
end | ||||||
``` | ||||||
|
||||||
When doing this kind of concatenation use `reduce(hcat, xs)` rather than `hcat(xs...)`. | ||||||
This will avoid the splatting penality, and will hit the optimised `reduce` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems relevant to mention
Float32
on GPU here. Also, operations do tend to be faster since you can fit more numbers in a SIMD lane at a given size.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to use the github suggestion feature, or to PR after it is merged.
I know little of GPU so someone else is better to write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just remove the "not because the operations are faster, but because the memory usage is halved." part?