Skip to content

make Base.setindex() public #55129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

aplavin
Copy link
Contributor

@aplavin aplavin commented Jul 15, 2024

setindex() is a widely used well-documented function, it should be marked public, right?

Not sure what's the right way to make a function with multiple methods public, please correct me if I'm wrong here!

@giordano
Copy link
Contributor

I'm missing the point of this PR, as far as I can tell this is already public:

% julia +nightly -q
julia> Base.ispublic(Base, :setindex!)
true

@aplavin
Copy link
Contributor Author

aplavin commented Jul 15, 2024

Note the difference:

julia> Base.ispublic(Base, :setindex!)
true

julia> Base.ispublic(Base, :setindex)
false

@nsajko nsajko added collections Data structures holding multiple items, e.g. sets feature Indicates new feature / enhancement requests needs docs Documentation for this change is required labels Jul 17, 2024
@aplavin
Copy link
Contributor Author

aplavin commented Jul 21, 2024

Not sure what's the status here. Should I do some update of this PR or ...?

@DilumAluthge DilumAluthge requested a review from ararslan July 21, 2024 20:30
@nsajko
Copy link
Contributor

nsajko commented Jul 21, 2024

If setindex is to be made public, it should be documented, i.e., its doc string should be included into the docs. Thus the "needs docs" label.

@aplavin
Copy link
Contributor Author

aplavin commented Jul 22, 2024

I'm not familiar with documentation generation, could use some pointers on where to update things :)
Of course, assuming that this change is approved.

@nsajko

This comment was marked as outdated.

@nsajko
Copy link
Contributor

nsajko commented Jul 22, 2024

If the function is to be made public, we'd need to decide if it's just for tuples, i.e, will adding methods for other collection types be allowed. Perhaps we could instead define something like setindex_tuple(t::Tuple, v, i) = setindex(t, v, i) and then make only setindex_tuple public.

Adding new methods would negatively affect abstract return type inference in the extreme case where the type of the input collection is unknown (inferred as Any), quite possibly requiring the ecosystem to add some type asserts for performance.

@ararslan ararslan removed their request for review July 23, 2024 01:44
@ararslan
Copy link
Member

(Removing my review request as @nsajko has got this handled 😉)

@aplavin
Copy link
Contributor Author

aplavin commented Jul 24, 2024

Not sure what you exactly the issue with extending setindex and what exactly you suggest, @nsajko. Currently, setindex is defined (& documented) in Base for tuples and namedtuples. It's also used by lots of packages, with many more methods defined for setindex – eg, StaticArrays do that.

As the semantics of the function is pretty clear, and it's already extensively used in the wild (including new method definitions), what specific issues do you see with marking it public?


Perhaps we could instead define something like setindex_tuple(t::Tuple, v, i) = setindex(t, v, i) and then make only setindex_tuple public.

Tbh, I don't see a reason to do that. Defining separate functions for every type, instead of methods of the same functions, goes directly against Julia multiple dispatch design and practice.

@nsajko
Copy link
Contributor

nsajko commented Jul 24, 2024

I'm referring to the "world splitting" optimization, see the Base.Experimental.@max_methods doc string. I agree that Base.setindex should probably be available for all collection types. But, it's good to consider the possible downsides upfront, seeing as, in a fresh REPL session, Base.setindex has just three methods - below the max_methods default.

EDIT: I only mentioned this issue for completeness. IMO it's secondary to proper documentation.

@@ -151,6 +151,7 @@ Base.modifyproperty!
Base.setpropertyonce!
Base.propertynames
Base.hasproperty
Base.setindex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask, Is there a logic to this specific location, between hasproperty and getfield? Mostly just curious, I don't really have a better suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, perhaps the doc string would fit better into the Collections and Data Structures page? Under Indexable Collections I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties/fields and indices are kinda close conceptually, and they are exactly the same for tuples and namedtuples (the types setindex is defined for). So it seems to make sense to put it here.
But any location works for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can't go under "Indexable collections" as-is, because of the "fully implemented by" list in that section. The list includes types which don't implement setindex...

@nsajko
Copy link
Contributor

nsajko commented Jul 24, 2024

FWIW, given that setindex is indeed already used in many packages for some reason, and its interface seems OK, I now agree that it should probably be made public.

@aplavin
Copy link
Contributor Author

aplavin commented Jul 24, 2024

I'm referring to the "world splitting" optimization, <...>

Still not sure how these considerations are relevant to marking setindex as public.
I don't propose adding any more methods in Base here, although, I would like that as a completely separate independent PR (#43155, #54508, #43338, #33495). Marking it as public should just reflect the practical status quo: this function is widely used & overloaded.

Btw, a lot of high-performance code uses StaticArrays, and with them loaded one has at least 6 setindex methods.

@aplavin
Copy link
Contributor Author

aplavin commented Jul 30, 2024

bump...

@aplavin
Copy link
Contributor Author

aplavin commented Aug 12, 2024

bump :)

@aplavin
Copy link
Contributor Author

aplavin commented Aug 21, 2024

bump... anything else I should do here?

@aplavin
Copy link
Contributor Author

aplavin commented Sep 2, 2024

and another bump

@aplavin
Copy link
Contributor Author

aplavin commented Sep 19, 2024

bump, anyone?..

@aplavin
Copy link
Contributor Author

aplavin commented Oct 3, 2024

should be a tiny change, any issues with it?..

@aplavin aplavin requested a review from ararslan October 4, 2024 11:41
@aplavin aplavin requested a review from nsajko October 4, 2024 11:41
@ararslan ararslan added the triage This should be discussed on a triage call label Oct 4, 2024
@oscardssmith
Copy link
Member

oscardssmith commented Nov 7, 2024

triage approves.
@aplavin add a news entry and we can merge.

@oscardssmith oscardssmith added needs news A NEWS entry is required for this change and removed triage This should be discussed on a triage call needs docs Documentation for this change is required labels Nov 7, 2024
@LilithHafner LilithHafner added needs news A NEWS entry is required for this change and removed needs docs Documentation for this change is required labels Nov 7, 2024
@CameronBieganek
Copy link
Contributor

CameronBieganek commented Nov 12, 2024

As the semantics of the function is pretty clear

Actually the semantics are not at all clear. The behavior of Base.setindex is very different for tuples and named tuples. For tuples, an out-of-bounds index throws a BoundsError, whereas for named tuples an out-of-bounds index extends the named tuple.

julia> Base.setindex((1, ), 10, 2)
ERROR: BoundsError: attempt to access Tuple{Int64} at index [2]
Stacktrace:
 [1] setindex(x::Tuple{Int64}, v::Int64, i::Int64)
   @ Base ./tuple.jl:57
 [2] top-level scope
   @ REPL[34]:1

julia> Base.setindex((a=1, ), 10, :b)
(a = 1, b = 10)

Ideally a generic function should have one docstring (per arity). In other words, there should only be one meaning for a generic function. Consider the Base.setindex! docstring, which makes no mention of the input types:

"""
    setindex!(collection, value, key...)

Store the given value at the given key or index within a collection. The syntax `a[i,j,...] =
x` is converted by the compiler to `(setindex!(a, x, i, j, ...); x)`.

...
"""

(I know there is a second docstring for setindex!, but let's ignore that for the purposes of this discussion.) Note how the docstring provides a single generic definition that can work for many different collection types.

I don't think Base.setindex in its current form should be made public. I see two possible paths forward:

  1. Change the behavior of Base.setindex(nt::NamedTuple, v, i) so that it errors on an out-of-bounds index. Then the docstring for Base.setindex can be the following:

    """
        setindex(x, v, i)
    
    Create a shallow copy of collection `x` with the value at index `i` set
    to `v`. Throw a `BoundsError` when `i` is out-of-bounds.
    """
    function setindex end
    
  2. Make two public functions (names can be bikeshed), something like the following:

    """
        setindex(x, v, i)
    
    Create a shallow copy of collection `x` with the value at index `i` set
    to `v`. Throw a `BoundsError` when `i` is out-of-bounds.
    """
    function setindex end
    
    """
        setkey(x, v, k)
    
    Create a shallow copy of collection `x` with a new key `k` set
    to `v`. If `k` is already in the keys of `x`, `v` replaces
    the old value.
    """
    function setkey end

If we decide to add setindex and setkey to Base, then we might as well export them.

@martinholters
Copy link
Member

Comparison to setindex! does not support that argument, actually:

julia> setindex!([1], 10, 2)
ERROR: BoundsError: attempt to access 1-element Vector{Int64} at index [2]
Stacktrace:
 [1] setindex!(A::Vector{Int64}, x::Int64, i1::Int64)
   @ Base ./array.jl:1021
 [2] top-level scope
   @ REPL[6]:1

julia> setindex!(Dict(:a => 1), 10, :b)
Dict{Symbol, Int64} with 2 entries:
  :a => 1
  :b => 10

So the consequence would be to write a docstring for setindex that does not talk about index bounds at all for consistency's sake?

@CameronBieganek
Copy link
Contributor

My comparison to the setindex! docstring was only meant to demonstrate what a generic docstring looks like, but I guess it was not the best example. The iterate docstring is a better example:

    iterate(iter [, state]) -> Union{Nothing, Tuple{Any, Any}}

Advance the iterator to obtain the next element. If no elements remain, `nothing` should
be returned. Otherwise, a 2-tuple of the next element and the new iteration state
should be returned.

Note that the docstring provides a single definition for iterate regardless of the input types.

But you bring up an interesting point. In my previous comment I explicitly ignored the second docstring for setindex!. So, setindex! actually has at least three distinct behaviors, depending on the input type:

Throw a bounds error for an array:

julia> x = [1, 2];

julia> x[3] = 10
ERROR: BoundsError: attempt to access 2-element Vector{Int64} at index [3]

Extend a dictionary:

julia> d = Dict(:a => 1);

julia> d[:b] = 2
2

Set multiple indices at once for an array:

julia> a = [1, 2, 3, 4];

julia> a[[2, 3]] = [10, 20];

julia> a
4-element Vector{Int64}:
  1
 10
 20
  4

So, setindex! is a very non-generic function. I guess that's ok, since setindex! is a bit special due to the square-bracket syntax sugar and it's importance for array indexing. Not every function in Julia has to be a generic function. There are some notable exceptions to the goal of having generic functions, like constructors and convert, where the exact behavior depends on the input type. But I think we should generally prefer to make Base Julia functions generic whenever possible. Base.setindex does not have syntax sugar and it is not ubiquitously used like setindex!, so I think it makes sense to make it generic (and we can add setkey if desired).

So the consequence would be to write a docstring for setindex that does not talk about index bounds at all for consistency's sake?

No, that's not the consequence. Please take a look at the example docstrings that I provided in my previous post.

@aplavin
Copy link
Contributor Author

aplavin commented Jan 4, 2025

Sorry, forgot about this for a while!

triage approves.
@aplavin add a news entry and we can merge.

Added.

@CameronBieganek
Copy link
Contributor

CameronBieganek commented Jan 8, 2025

It looks like my concern about the generic definition of setindex still has not been addressed. What happens when an out-of-bounds key is provided? Does it throw an error or does it extend the collection? I think the only plausible solution is to make two functions, setindex and setkey (name can be bikeshed), as I described above.

I think setindex should be defined for tuples but not for named tuples, and setkey should be defined for named tuples but not tuples. In other words, this really needs to be a method error: setkey((1, ), 2, 1_000_000).

@aplavin
Copy link
Contributor Author

aplavin commented Jan 8, 2025

I think the only reasonable way to define setindex semantics is to follow setindex! as closely as possible. Anything else would be very confusing.
AFAICT, this is the case with the current setindex implementation: throws an OOB error for tuples like setindex! for arrays, adds a new element for named tuples like setindex! for dicts.

This function is well-defined and documented, naturally leading to a widespread use in many packages.
I can imagine

I think setindex should be defined for tuples but not for named tuples

would cause quite some breakage if implemented. That's one of the reasons to mark it public – because it effectively is already.

I haven't seen new triage comments here, so assuming they still agree with the PR.

@CameronBieganek
Copy link
Contributor

I haven't seen new triage comments here, so assuming they still agree with the PR.

My concerns were raised after that triage call, so triage may not have considered the issues that I raised.

I think the only reasonable way to define setindex semantics is to follow setindex! as closely as possible. Anything else would be very confusing.

I agree there is some logic to that, but setindex! is very special because it has syntax sugar that is ubiquitously used. Because square bracket indexing is used so ubiquitously, it's not surprising that the various uses do not behave in a consistent, generic way. However, Base.setindex is not special and it is not nearly as ubiquitous. It is overloaded in StaticArrays and Accessors. Where else?

setindex does not even need to be defined in Base. In Accessors.jl you can define your own setindex function with any behavior you like (though I would still prefer a generic definition).

That's one of the reasons to mark it public – because it effectively is already.

We shouldn't be encouraging the usage of Base internals. This PR sets a bad precedent. Once someone starts using a Base internal they can open an issue requesting that the internal is marked public because "it's used widely in the ecosystem."


Consider the following two code chunks. The first one uses setindex and the second one uses setkey, as I defined them above. Note that there is no mention of the types of the values that are used in either example. Yet the examples can be understood to be correct due to the generic definitions of setindex and setkey.

setindex example:

b = try
    setindex(a, v, i)
catch e
    e isa BoundsError ? a : rethrow()
end

setkey example:

The following will never throw a BoundsError:

b = setkey(a, v, k)

Of course, it could throw other errors due to type mismatches of the inputs. For ultimate clarity, the following should probably be added to the docstring for setkey (and similarly for setindex):

"""
    setkey(x, v, k)

...

The following must be true, or else an error will be thrown:

- `typeof(k) <: keytype(x)`
- `typeof(v) <: valtype(x)`
"""

I think setindex should be defined for tuples but not for named tuples

would cause quite some breakage if implemented.

Ultimately, Base.setindex is not public, so Base has no backwards compatibility obligations for Base.setindex. Though I agree that it would make sense for both setkey and setindex (as I have defined them) to work on named tuples. The important thing is that setkey(("a", ), "b", 1_000_000) should be a method error.

Actually, named tuples are a weird beast... Arguably the second Base.setindex call below should work. (In my formulation, the first call would be a setkey call and the second call would be a setindex call, and they would both work.)

julia> nt = (a = 1, b = 2);

julia> Base.setindex(nt, 100, :c)
(a = 1, b = 2, c = 100)

julia> Base.setindex(nt, 100, 2)
ERROR: MethodError: no method matching setindex(::@NamedTuple{a::Int64, b::Int64}, ::Int64, ::Int64)

@aplavin
Copy link
Contributor Author

aplavin commented Jan 9, 2025

Note that setindex is not some obscure function, it's used and sometimes overloaded in many packages.
I don't really have more arguments for making it public than

  • it's a widely used well-documented function with clear and accepted semantics
  • if setindex exists, it should be as consistent with setindex! as possible to avoid confusion; and that's exactly the current behavior.

It's clear that you disagree. Neither of us can merge this PR anyways, so now it's up to Julia developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets feature Indicates new feature / enhancement requests needs news A NEWS entry is required for this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants