-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Stdlib] Optimize CollectionType.first #825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
public var first: Generator.Element? { | ||
return isEmpty ? nil : self[startIndex] | ||
var gen = generate() | ||
return gen.next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative implementation looks like
let start = startIndex
return start == endIndex ? nil : self[start]
This basically just reimplements isEmpty
locally without throwing away the start index. The generator approach seems slightly more straightforward though (and with my proposal to add var first
to SequenceType
, this is how it would be implemented there anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also very mildly concerned that, despite endIndex
being documented as O(1), it's possible for some collection to implement it more expensively (although they really shouldn't do that). Conversely, it's possible for a collection to implement a generator such that the creation of the generator is expensive, but they shouldn't do that either. Since iteration of sequences (and therefore generators) is so prevalent and therefore a prime target for optimization, I'm assuming that the generator approach (as used in my PR) is more reliably efficient than the index approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the generator approach is inherently more expensive in some cases. For example, consider
(0..<20).lazy.map { veryExpensiveCall($0) }.isEmpty
I think comparing indices is the obvious way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not proposing that isEmpty be changed to use a generator. This is just the first
property which, by definition, has to access the first element anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this? Just want to make sure before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I ran the regular test suite but not the validation suite. I'd check but I'm not at my computer right now. If you don't merge it tonight then I'll make sure to re-run it tomorrow just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you’re not sure you ran at least one of them, I’d rather wait.
e02d50b
to
6bac129
Compare
I just re-pushed to get rid of the complexity doc comment change, because I assume you don't actually want that given that you closed #824. I'm running tests now. |
Tests finished. I got 3 failures, but they're unrelated to this commit.
(all 3 of them were assertion failures at This was tested with |
@dabrahams I confirmed, the test failures happen on |
One final request, sorry to say: when we make a change for optimization purposes we should introduce a comment that explains why the code is the way it is, so some future maintainer doesn’t assume they can simplify it away to one line. I can imagine cases where the isEmpty check could be an improvement (e.g. if LazyFilterCollection’s isEmpty first checked isEmpty on its underlying collection), which makes changing it back not unrealistic. Thanks for your patient revising… |
Changing it back in that case still isn't realistic because That said, I will go ahead and add the comment explaining why. |
For lazy collections, `isEmpty` and `startIndex` may be O(N) operations. The old implementation ended up being potentially O(2N) instead of O(1). In particular, accessing `col.lazy.filter(pred).first` would evaluate the predicate on elements twice, once to determine the result of `isEmpty` and once to determine `startIndex`.
6bac129
to
6ae85ce
Compare
PR updated. |
It’s realistic if you think you want to optimize for the case where the underlying sequence of the |
If the underlying sequence is yet another |
Whether or not it could happen, pass (performance) regression tests, and not get reverted is academic. The point is to prevent maintainers from even heading down that road mentally. |
[Stdlib] Optimize CollectionType.first
@dabrahams Ah, I see. I thought you were arguing in favor of the possibility of reverting it at some point, but I gather now that you're saying it's not unrealistic for someone else to think reverting it makes sense (without the comment saying why it's a bad idea). |
@kballard It looks like this change could be trivially tested... |
Yeah, you're right. I was being lazy. I'll write up a test tomorrow.
|
@gribozavr Test submitted as #860 |
This tests the optimization commited in swiftlang#825.
For lazy collections,
isEmpty
andstartIndex
may be O(N) operations.The old implementation ended up being potentially O(2N) instead of O(1).
In particular, accessing
col.lazy.filter(pred).first
would evaluatethe predicate on elements twice, once to determine the result of
isEmpty
and once to determinestartIndex
.While we're at it, tweak the complexity as well, as lazy collections may
be O(N).