Description
The current design of map
, collect
and similar
seems problematic for custom arrays which are associated with a custom element type, like CategoricalArray
and its CategoricalValue
type.
In CategoricalArrays.jl, I would like map(f, ::CategoricalArray)
to return a CategoricalArray
if f
returns only Union{CategoricalValue, Missing}
values. This is natural in particular so that map(identity, ::CategoricalArray)
returns a CategoricalArray
.
To achieve this, I could define Base.map(f, ::CategoricalArray)
, but that would require duplicating tricky code from Base since it needs to handle eltype widening and so on. So I tried to define similar
so that collect
, that map
uses under the hood, returns a CategoricalArray
when appropriate. But collect
uses similar(1:1, T, axes(itr))
, so I have to override similar(::AbstractRange, ::Type{<:Union{CategoricalValue, Missing}})
. For consistency I also have to define similar methods for AbstractArray
, Array
, Vector
and Matrix
(due to ambiguities).
Doing that has two consequences:
- The first is that
collect(::CategoricalArray)
always returns aCategoricalArray
. This makes sense actually sinceArray{CategoricalValue}
is an inefficient type. But that seems to go against the docstring forcollect
which says that it returns anArray
. - The second, more serious issue is that
getindex(::Array{<:CategoricalValue}, ::Array)
also returns aCategoricalArray
. This doesn't sound correct.
This leads me to raise two questions/proposals:
- Should the
collect
docstring be made less strict? It sounds useful to be able to collect the contents of an interator into the most natural/efficient array type. If one really wants anArray
better doArray(itr)
-- otherwisecollect
is redundant. - Should
collect
use a new system likesimilar(AbstractArray, T, axes(itr))
instead ofsimilar(1:1, T, axes(itr))
? That would allow specifying that the most appropriateAbstractArray{<:T}
is requested rather thanArray
. That would differ fromgetindex
which really wants the type of the input array. This could be introduced without breakage by having it fall back tosimilar(Array, ...)
.