Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StructArrays.append!! #97

Merged
merged 5 commits into from
Dec 19, 2019
Merged

Add StructArrays.append!! #97

merged 5 commits into from
Dec 19, 2019

Conversation

tkf
Copy link
Member

@tkf tkf commented Dec 15, 2019

I'm trying to create a uniform interface (e.g., BangBang.append!!) for mutate-or-widen style manipulations of various container types. This is very useful for implementing push-oriented data manipulation API like Transducers.copy.

As StructArrays already has grow_to_structarray!, it would be nice if I can use it as-is. Are you OK with providing it as a public API? This PR adds documentation in README to explicitly document it as a public API.

@codecov-io
Copy link

codecov-io commented Dec 15, 2019

Codecov Report

Merging #97 into master will increase coverage by 0.32%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #97      +/-   ##
==========================================
+ Coverage   91.03%   91.36%   +0.32%     
==========================================
  Files           9        9              
  Lines         368      382      +14     
==========================================
+ Hits          335      349      +14     
  Misses         33       33
Impacted Files Coverage Δ
src/collect.jl 96.05% <100%> (+0.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d399b1...a1cb6f5. Read the comment docs.

@piever
Copy link
Collaborator

piever commented Dec 15, 2019

Thanks for the PR! In principle I see nothing wrong in making this specific function public, it has been unchanged for a while. I wonder whether you also need collect_to_structarray!, which is its counterpart in the case where the length of the iterator is known, or maybe that doesn't make sense for transducers? I don't know the framework well enough.

Somehow I wonder whether it's possible to flip the dependency story and let StructArrays use the BangBang machinery to implement the constructor from an iterable (Tables.jl could probably also use that, see https://github.com/JuliaData/Tables.jl/blob/master/src/fallbacks.jl#L146). At the moment I suspect it's not feasible as BangBang is a bit too heavy to be a dependency of StructArrays, but maybe there is a refactor that makes this possible.

@tkf
Copy link
Member Author

tkf commented Dec 15, 2019

I wonder whether you also need collect_to_structarray!, which is its counterpart in the case where the length of the iterator is known

Thanks for pointing this out. If you are OK with adding collect_to_structarray!(dest, itr) which dispatches to appropriate methods depending on Base.IteratorSize(itr), that would be great. I think I can cook it up.

Somehow I wonder whether it's possible to flip the dependency story and let StructArrays use the BangBang machinery to implement the constructor from an iterable

This is a best case scenario for me :) I think it'd be nice since you can have a uniform API that can also be used with immutable types like StructArray-of-StaticArray.

At the moment I suspect it's not feasible as BangBang is a bit too heavy to be a dependency of StructArrays, but maybe there is a refactor that makes this possible.

Yes, BangBang tries to replicate some non-trivial portion of Base API so it's a bit hard to turn it into a small package. I don't think it's impossible, but, since it's still a young package, I want to consolidate its API and implementation first before splitting it up.

@piever
Copy link
Collaborator

piever commented Dec 17, 2019

I was a bit sloppy, what I meant is that there is a collection mechanism for the two cases, known and unknown size. The problem is that the two do something different, one uses append!!, and the other setindex!!. The dispatch happens already in collect_structarray which is the non-mutating part, as the initialization of the destination array depends on whether the iterator size is known (in which case the length of the array should match it) or unknown (and then it starts with length 0 and I push to it).

Maybe the easier would be to expose to BangBang the equivalent of setindex!!. The problem is that here there is the extra optimization that if the eltype do not match in setindex!!(dest, i, val), and I need to widen the destination array, I know that I should only copy the entries of dest before index i, as the later ones are still undef. Is there a function for this in BangBang? Otherwise, how do you implement collection for a transducer of known length?

@tkf
Copy link
Member Author

tkf commented Dec 18, 2019

I guess I confused you or I'm still confused (or both). To clarify what I meant by collect_to_structarray!(dest, itr), I implemented it:

"""
`collect_to_structarray!(dest, itr) -> dest′`
Try to append `itr` into a vector `dest`. Widen element type of
`dest` if it cannot hold the elements of `itr`. That is to say,
```julia
vcat(dest, StructVector(itr)) == collect_to_structarray!(dest, itr)
```
holds. Note that `dest′` may or may not be the same object as `dest`.
The state of `dest` is unpredictable after `collect_to_structarray!`
is called (e.g., it may contain just half of the elements from `itr`).
"""
collect_to_structarray!(dest::AbstractVector, itr) =
_collect_or_grow!(dest, itr, Base.IteratorSize(itr))
function _collect_or_grow!(dest::AbstractVector, itr, ::Union{Base.HasShape, Base.HasLength})
n = length(itr) # itr may be stateful so do this first
fr = iterate(itr)
fr === nothing && return dest
el, st = fr
i = lastindex(dest) + 1
if iscompatible(el, dest)
resize!(dest, length(dest) + n)
dest[i] = el
return _collect_to_structarray!(dest, itr, i + 1, st)
else
new = widenstructarray(dest, i, el)
resize!(new, length(dest) + n)
@inbounds new[i] = el
return _collect_to_structarray!(new, itr, i + 1, st)
end
end
_collect_or_grow!(dest::AbstractVector, itr, ::Base.SizeUnknown) =
grow_to_structarray!(dest, itr)

Are you OK with adding this as a public API so that I can wrap it as BangBang.append!!?

I should only copy the entries of dest before index i

Thanks a lot for pointing this out! I was focusing providing BangBang.f!! for each Base.f! so I missed this point.

Otherwise, how do you implement collection for a transducer of known length?

I haven't implemented index-based collect in Transducers.jl as it's a bit tricky to implement. This is one of the thing that is easy in iterators. Also, it is only easily doable with map-like function. In this case, I think Base.map is the best approach. At the moment, Transducers.jl only uses grow_to-like approach.

@piever
Copy link
Collaborator

piever commented Dec 18, 2019

Ah, now I get it! I used that iterators have length to preallocate the whole thing at once (and skip append!altogether), but of course you are right, also the append! function benefits from knowing the length, as it can just resize! once.

I only have some minor nitpicking in terms of names (I may have induced the confusion, being unclear in a previous post). Could collect_to_structarray! and grow_to_structarray! be left as they are, and the novel function (that dispatches correctly) be simply called append!! (with the same documentation as collect_to_structarray! has in this PR)? It would be unexported but documented (so that it can stay), and when we figure out the dependency story, it'd be the same function as BangBang.append!!. collect_to_structarray! and grow_to_structarray! are ported from Base.collect_to! and Base.grow_to! with the same signature, and used by IndexedTables, so I'd prefer to not touch them.

Happy to merge after the renaming.

src/collect.jl Outdated Show resolved Hide resolved
src/collect.jl Outdated Show resolved Hide resolved
src/collect.jl Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@tkf
Copy link
Member Author

tkf commented Dec 19, 2019

Thanks for the review. Yes, StructArrays.append!! would be a perfect API for me. For collect_to*! and grow_to*!, it makes sense to stick with Base's naming convention and semantics.

tkf added a commit to JuliaFolds/BangBang.jl that referenced this pull request Dec 19, 2019
@tkf tkf changed the title Request: Expose grow_to_structarray! as a public API Add StructArrays.append!! Dec 19, 2019
@piever piever merged commit 1c447d6 into JuliaArrays:master Dec 19, 2019
@tkf
Copy link
Member Author

tkf commented May 21, 2020

I know that I should only copy the entries of dest before index i, as the later ones are still undef. Is there a function for this in BangBang?

FYI, I just added BangBang.collector which lets you write collect-like functions only using push!! but with performance close to (indistinguishable to?) manual implementation (compare :impl => "man" and :impl => "coll" under :comp => "map" JuliaFolds/BangBang.jl#143 (comment)). It requires Julia > 1.4 for good performance, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants