Skip to content

Performance with views #40

Open
Open
@mcabbott

Description

@mcabbott

I was hit by the following performance bug, when using this package and MLUtils:

julia> let 
       x, _ = Flux.splitobs(Flux.onehotbatch(rand(1:99, 100), 1:100); at=1.0, shuffle=false)
       @show summary(x)
       emb = Flux.Embedding(100 => 100)
       a = @btime $emb($x)  # very slow fallback matmul
       println("OneHotMatrix")
       b = @btime $emb(parent($x))  # indexing
       x32 = x .+ 0f0
       @show summary(x32)
       c = @btime $emb.weight * $x32  # BLAS
       end;
summary(x) = "100×100 view(OneHotMatrix(::Vector{UInt32}), :, 1:100) with eltype Bool"
  min 590.041 μs, mean 659.717 μs (7 allocations, 62.50 KiB)
OneHotMatrix
  min 2.953 μs, mean 7.642 μs (2 allocations, 39.11 KiB)
summary(x32) = "100×100 Matrix{Float32}"
  min 6.583 μs, mean 10.608 μs (2 allocations, 39.11 KiB)

One way around this would be to include such things in OneHotLike. Another would be to simply turn views into copies, which is what happens if you reverse the order:

julia> let
       tmp, _ = Flux.splitobs(rand(1:99, 100); at=1.0, shuffle= false)
       x = Flux.onehotbatch(tmp, 1:100)
       @show summary(x)
       emb = Flux.Embedding(100 => 100)
       @btime $emb($x)
       end;
summary(x) = "100×100 OneHotMatrix(::Vector{UInt32}) with eltype Bool"
  min 2.970 μs, mean 7.479 μs (2 allocations, 39.11 KiB)

More immediately, MLUtils.splitobs could also do what it says it does, and call getobs:

help?> Flux.splitobs
  splitobs(data; at, shuffle=false) -> Tuple

  Split the data into multiple subsets proportional to the value(s) of at.

  If shuffle=true, randomly permute the observations before splitting.

  Supports any datatype implementing the numobs and getobs interfaces.
[...]

julia> Flux.getobs(ones(1,5), 1:2)  # what it says it does
1×2 Matrix{Float64}:
 1.0  1.0

julia> Flux.obsview(ones(1,5), 1:2)  # what it actually uses
1×2 view(::Matrix{Float64}, :, 1:2) with eltype Float64:
 1.0  1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions