Skip to content

map, collect and similar for custom arrays #36106

Open
@nalimilan

Description

@nalimilan

The current design of map, collect and similar seems problematic for custom arrays which are associated with a custom element type, like CategoricalArray and its CategoricalValue type.

In CategoricalArrays.jl, I would like map(f, ::CategoricalArray) to return a CategoricalArray if f returns only Union{CategoricalValue, Missing} values. This is natural in particular so that map(identity, ::CategoricalArray) returns a CategoricalArray.

To achieve this, I could define Base.map(f, ::CategoricalArray), but that would require duplicating tricky code from Base since it needs to handle eltype widening and so on. So I tried to define similar so that collect, that map uses under the hood, returns a CategoricalArray when appropriate. But collect uses similar(1:1, T, axes(itr)), so I have to override similar(::AbstractRange, ::Type{<:Union{CategoricalValue, Missing}}). For consistency I also have to define similar methods for AbstractArray, Array, Vector and Matrix (due to ambiguities).

Doing that has two consequences:

  • The first is that collect(::CategoricalArray) always returns a CategoricalArray. This makes sense actually since Array{CategoricalValue} is an inefficient type. But that seems to go against the docstring for collect which says that it returns an Array.
  • The second, more serious issue is that getindex(::Array{<:CategoricalValue}, ::Array) also returns a CategoricalArray. This doesn't sound correct.

This leads me to raise two questions/proposals:

  1. Should the collect docstring be made less strict? It sounds useful to be able to collect the contents of an interator into the most natural/efficient array type. If one really wants an Array better do Array(itr) -- otherwise collect is redundant.
  2. Should collect use a new system like similar(AbstractArray, T, axes(itr)) instead of similar(1:1, T, axes(itr))? That would allow specifying that the most appropriate AbstractArray{<:T} is requested rather than Array. That would differ from getindex which really wants the type of the input array. This could be introduced without breakage by having it fall back to similar(Array, ...).

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrays[a, r, r, a, y, s]designDesign of APIs or of the language itselfspeculativeWhether the change will be implemented is speculative

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions