-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gather #280
add gather #280
Conversation
6e461f9
to
b4ad236
Compare
@chengchingwen You may want to check it. |
1febf7b
to
b27a8dd
Compare
this should be ready to go |
Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2e08ee2
to
e14d0ac
Compare
I'm going to merge this as soon as tests pass so that @yuehhua can continue his work (adding AD rules, adding cuda support). |
I'd expect to do a review of the implementation before merging these operations into NNlib. |
I'll add fake comments in the new pr so that the implementation can be easily reviewed |
As I see it, the implementation is not following the TF semantics, is that right? I'd rather have reviews before merging. |
There seems to be a lot of allocating going on, we could alternatively let Julia figure out the indexing for us; function gather(arr, inds)
s = size(arr, 1)
g1 = similar(arr, 2s, size(inds, 1))
g2 = similar(g1)
for k ∈ 1:2, j ∈ axes(inds,1) # matrix case
ii = inds[j, k]
o1 = (k-1)*s
o2 = (2-k)*s
for i ∈ axes(arr,1)
g = arr[i,ii]
g1[i + o1, j] = g
g2[i + o2, j] = g
end
end
g1, g2
end So this version would not allocate quite as much, and we can use column copies Also leads to a sizeable perf difference for small cases (and maybe for large ones). This is only generating the first half, but we can make it do the other as well of course. julia> src = [3, 4, 5, 6, 7];
julia> index = [1 2 3 4;
4 2 1 3;
3 5 5 3][:,:,1:1]
julia> @btime gather($(src), $index);
2.905 μs (37 allocations: 1.11 KiB)
# this one
julia> @btime gather($(src'), $index);
131.193 ns (2 allocations: 256 bytes) |
I think it does follow TF semantics
yeah, ok, but as usual... please be quicker and avoid blocking other people work all over the place |
I don't understand what your implementation is doing (and why it is returning 2 things). julia> function gather2(arr, inds)
s = size(arr, 1)
g1 = similar(arr, 2s, size(inds, 1))
g2 = similar(g1)
for k ∈ 1:2, j ∈ axes(inds,1)
ii = inds[j, k]
o1 = (k-1)*s
o2 = (2-k)*s
for i ∈ axes(arr,1)
g = arr[i,ii]
g1[i + o1, j] = g
g2[i + o2, j] = g
end
end
g1, g2
end
julia> src = [3, 4, 5, 6, 7];
julia> index = [1 2 3 4;
4 2 1 3;
3 5 5 3][:,:,1:1];
julia> @btime gather($(src), $index);
60.446 ns (1 allocation: 176 bytes)
julia> @btime gather2($(src), $index);
ERROR: BoundsError: attempt to access 5-element Vector{Int64} at index [1, 4]
Stacktrace:
[1] getindex
@ ./array.jl:802 [inlined]
[2] gather2(arr::Vector{Int64}, inds::Array{Int64, 3})
@ Main ./REPL[3]:10
[3] var"##core#313"(src#310::Vector{Int64}, index#311::Array{Int64, 3})
@ Main ~/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:371 I see just 1 allocation for the implementation here, are you benchmarking the right thing? |
I mentioned in the snippet its for the matrix case, but can easily be extended. Yes - benchmarking the correct thing. Its returning two things because that's how I had needed it for a different operation - sorry should have removed it. You'd notice you need to transpose |
so I used the transpose, and on previous example get this julia> gather(src,index)
3×4×1 Array{Int64, 3}:
[:, :, 1] =
3 4 5 6
6 4 3 5
5 7 7 5
julia> gather2(src',index) #not sure how to interpret the output here
([3 6 5; 4 4 7], [4 4 7; 3 6 5])
julia> @btime gather($(src), $index);
60.218 ns (1 allocation: 176 bytes)
julia> @btime gather2($(src'), $index);
68.138 ns (2 allocations: 256 bytes)
`` |
For the general case - Julia indexing (it might be better for longer term use to benefit from Base improvements if the perf isn't off which I don't think it is) function gatherweave2(arr, inds) where {T,N}
g = arr[:, inds]
v = reshape(g, length(g),1) # vec(g)
reshape(v, size(inds)...)
end Huh, interesting, I definitely can't reproduce the 60 ns number - what Julia version are you on? More generally - I think this approach is fine. Be good to remove the |
yeah it's julia 1.6, probably they optimized the views.
it's used in a few places, would be inconvenient to type the union each time
I don't know why you have those reshape there, but I can try to use indexing in |
On the usual small example indexing and current gather perform the same julia> function gather3(src::AbstractArray{Tsrc, Nsrc},
idx::AbstractArray{<:Integer}) where {Tsrc, Nsrc}
colons = ntuple(i -> Colon(), Nsrc-1)
return src[colons..., idx]
end
julia> @btime gather3($(src), $index);
54.328 ns (1 allocation: 176 bytes)
julia> @btime gather($(src), $index);
54.847 ns (1 allocation: 176 bytes) I did a small tweak to |
We could potentially allow Cartesian indices in |
we could. It would mean to clutter the code a bit though since you can't splat Looking at the tensorflow implementation, |
bump |
@DhairyaLGandhi The suggestion is great! For performance, we can transform |
From an api pov, that's correct. But if we generalise it to use Cartesian, we can accept both and have a common implementation internally which would be cleaner too. |
Oh! Sure!😊 We can do that! |
easiest part of #255
@yuehhua