Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterators.first and Iterators.last which don't throw for empty collections #37119

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

non-Jedi
Copy link
Contributor

@non-Jedi non-Jedi commented Aug 19, 2020

EDIT: This PR now implements option 2 discussed at the bottom of this post rather than option 1 which it originally implemented.

On zulip, there was a thread about the fact that there's not a good
way of finding the first item matching a predicate in a non-indexable iterable without
manually writing out the for-loop
(https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/.60find.60.20in.20iterables). One
possibility is using the existing first function composed with
Iterators.filter:

first(Iterators.filter(>(5), 1:10))

The problem with this approach is that it requires wrapping in a try
block if the predicate not being satisfied is a
possibility (Iterators.filter returns an empty collection). There
should be a way of expressing this without using exception-handling as
control flow or writing the for-loop manually.

Ideally, I think first and last should just return
Union{Some{T},Nothing} always, but that would be a breaking
change. So I see two obvious alternatives:

  1. This commit adds an optional predicate argument to first and last
    which is mapped over the collection and the first/last value
    satisfying predicate is returned wrapped in Some.
  2. Add a Iterators.first/Iterators.last which are identical to the
    versions in Base except that when successful they return Some{T}
    and when the collection is empty return Nothing

On zulip, there was a thread about the fact that there's not a good
way of finding the first item matching a predicate in a non-indexable
iterable without manually writing out the for-loop. One possibility is
using the existing first function composed with Iterators.filter:

    first(Iterators.filter(>(5), 1:10))

The problem with this approach is that it requires wrapping in a try
block if the predicate not being satisfied is a
possibility (Iterators.filter returns an empty collection). There
should be a way of expressing this without using exception-handling as
control flow or writing the for-loop manually.

Ideally, I think first and last should just return
Union{Some{T},Nothing} always, but that would be a breaking
change. So I see two obvious alternatives:

1. This commit adds an optional predicate argument to first and last
   which is mapped over the collection and the first/last value
   satisfying predicate is returned.
2. Add a Iterators.first/Iterators.last which are identical to the
   versions in Base except that when successful they return Some{T}
   and when unsuccesful return Nothing
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
Co-authored-by: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se>
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
From code review.

- Add ref links.
- Avoid using |>.
- Use , instead of ; between array elements.

Co-authored-by: Alex Arslan <ararslan@comcast.net>
Copy link
Contributor Author

@non-Jedi non-Jedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the docstrings instead be combined with the existing docstring for first(coll) as e.g.:

"""
    first([predicate=_ -> true,] coll)

...
"""

Also, I really appreciate the detailed code review. It would be great to get some feedback on the overall concept as well.

base/abstractarray.jl Outdated Show resolved Hide resolved
base/abstractarray.jl Outdated Show resolved Hide resolved
Copy link
Member

@mbauman mbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully convinced about the functionality, but could be persuaded.


Get the first element of `coll` satisfying `predicate` wrapped in [`Some`](@ref).

If no element of `coll` satisfies `predicate`, return `nothing`.
Copy link
Member

@mbauman mbauman Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting — and deviates from the existing first which would throw an error. This gives me a bit of pause, but at a minimum it means that we cannot combine these docstrings:

julia> first(_->true, [])

julia> first([])
ERROR: BoundsError: attempt to access 0-element Array{Any,1} at index [1]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the docstrings.

"""
function first(predicate, itr)
for x in itr
predicate(x) && return Some(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not thrilled about Some here — I don't think we have a single API in base that returns Some. That said, this is unlike the other find/search APIs in that they return indices or RegexMatches or values from some other limited type set that does not include Nothing.

Copy link
Contributor Author

@non-Jedi non-Jedi Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've convinced myself that at the very least this should be a separate function from first/last. Does Iterators.first (as suggested in OP) sound like a good name to you? In that case, we wouldn't have to create a predicate accepting version as it would compose with Iterators.filter. EDIT: this PR now implements Iterators.first instead.

In reality, I think all APIs that generically return an element of a collection but throw if the element isn't present should instead return Union{Some{T},Nothing}. This includes first, last, getindex, reduce, foldl, etc. This is obviously breaking; I should probably create a "Taking something seriously" issue to get feedback on that idea for julia 2.0.

base/abstractarray.jl Outdated Show resolved Hide resolved
Instead of creating a version of Base.first and Base.last accepting a
predicate, create functions in the Iterators module. This allows easy
composition with Iterators.filter without having to accept a predicate
argument.
@non-Jedi non-Jedi changed the title Add methods of first and last accepting a predicate Add Iterators.first and Iterators.last which don't throw for empty collections Aug 27, 2020
@non-Jedi
Copy link
Contributor Author

Could someone educate me on the history of Some? @mbauman correctly noted that no base functions currently return values wrapped in Some. It seems odd to have an exported type in base that's basically unused (except for a few internal reduction functions). Was it added to base to be a point of coordination for external packages? Is remnant of the days of Nullable? Is it purely so that something(Some(_->nothing)) can return nothing?

@non-Jedi
Copy link
Contributor Author

I'd like to bump this.

The core functionality I'm trying to get into Base here is some function that allows getting the first element of a lazily filtered (possibly empty) collection without wrapping in a try block; a function returning Union{Some{T},Nothing} seems like the obvious way of doing this. I'm relatively ambivalent on how that should be spelled.

If the decision is that a function for getting the first element of a lazily filtered collection isn't necessary in Base, I'm happy to close this. If a different approach would be better, I'd be happy to work on this further.

@rfourquet
Copy link
Member

I like trying to solve the stated problem

there's not a good way of finding the first item matching a predicate in a non-indexable iterable [...]

  • So my first comment: while having a non-throwing first seems like a good general solution, what about designing a function which solves directly the problem? e.g. Iterators.findfirst(f, iter) (which could for example return the "index" (a la enumerate) together with the found object).

  • Regarding using Some, it looks like an appropriate use here even if it's currently seldom used elsewhere. Note this other PR implementing "2-argument get returning Union{Nothing,Some}".

  • I don't like putting this functionality in Iterators.first. As far as I am aware, other public functions in Iterators create an iterator. Naming this one Iterators.first feels just like adding a random namespace in front of first just because first is already taken. What about tryfirst (by analogy to tryparse which doesn't throw (unlike parse) ?

@non-Jedi
Copy link
Contributor Author

non-Jedi commented Sep 21, 2020

  • So my first comment: while having a non-throwing first seems like a good general solution, what about designing a function which solves directly the problem? e.g. Iterators.findfirst(f, iter) (which could for example return the "index" (a la enumerate) together with the found object).

Not all iterables have meaningful indices (e.g. readeach), or is the idea that it would work exactly like enumerate and count the number of elements?

  • I don't like putting this functionality in Iterators.first. As far as I am aware, other public functions in Iterators create an iterator. Naming this one Iterators.first feels just like adding a random namespace in front of first just because first is already taken. What about tryfirst (by analogy to tryparse which doesn't throw (unlike parse) ?

I agree on the unsuitability of the Iterators namespace; mostly I was trying to put it next to Iterators.filter where I feel it's most useful. tryfirst is a pretty good name given the precedent of tryparse. Another possibility would be to introduce a new namespace/module where this could live as first alongside Union{Some{T},Nothing}-returning versions of getindex, reduce, etc.

@non-Jedi
Copy link
Contributor Author

non-Jedi commented Jul 10, 2023

So revisiting this in 2023, I think what I really want is for Maybe.jl (which didn't have Maybe.first at the time I authored this PR) to be a standard library. For reference, the best collection of links to related issues and PRs about functions returning Union{Some{T}, Nothing} is probably #43746, but there are probably others missing from the list there (it's not something easy to find using github's search abilities).

If there's not appetite for a change like this, I'm happy to update this PR to rename the function tryfirst as suggested by @rfourquet above. I think the specific case of first without throwing on empty is more important than the generalization Maybe.jl gives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants