[Stdlib] Optimize CollectionType.first #825

lilyball · 2015-12-31T00:22:19Z

For lazy collections, isEmpty and startIndex may be O(N) operations.
The old implementation ended up being potentially O(2N) instead of O(1).
In particular, accessing col.lazy.filter(pred).first would evaluate
the predicate on elements twice, once to determine the result of
isEmpty and once to determine startIndex.

While we're at it, tweak the complexity as well, as lazy collections may
be O(N).

lilyball · 2015-12-31T00:24:17Z

stdlib/public/core/Collection.swift

  public var first: Generator.Element? {
-    return isEmpty ? nil : self[startIndex]
+    var gen = generate()
+    return gen.next()


An alternative implementation looks like

let start = startIndex return start == endIndex ? nil : self[start]

This basically just reimplements isEmpty locally without throwing away the start index. The generator approach seems slightly more straightforward though (and with my proposal to add var first to SequenceType, this is how it would be implemented there anyway).

I'm also very mildly concerned that, despite endIndex being documented as O(1), it's possible for some collection to implement it more expensively (although they really shouldn't do that). Conversely, it's possible for a collection to implement a generator such that the creation of the generator is expensive, but they shouldn't do that either. Since iteration of sequences (and therefore generators) is so prevalent and therefore a prime target for optimization, I'm assuming that the generator approach (as used in my PR) is more reliably efficient than the index approach.

The problem is that the generator approach is inherently more expensive in some cases. For example, consider

(0..<20).lazy.map { veryExpensiveCall($0) }.isEmpty

I think comparing indices is the obvious way to go.

I'm not proposing that isEmpty be changed to use a generator. This is just the first property which, by definition, has to access the first element anyway.

Did you test this? Just want to make sure before merging.

I believe I ran the regular test suite but not the validation suite. I'd check but I'm not at my computer right now. If you don't merge it tonight then I'll make sure to re-run it tomorrow just to be sure.

If you’re not sure you ran at least one of them, I’d rather wait.

lilyball · 2015-12-31T18:09:42Z

I just re-pushed to get rid of the complexity doc comment change, because I assume you don't actually want that given that you closed #824. I'm running tests now.

lilyball · 2015-12-31T18:26:51Z

Tests finished. I got 3 failures, but they're unrelated to this commit.

Failing Tests (3):
    Swift-Unit :: Basic/SwiftBasicTests/Compression.FlatEncoding
    Swift-Unit :: Basic/SwiftBasicTests/Compression.FullCompression
    Swift-Unit :: Basic/SwiftBasicTests/Compression.VariableLength

(all 3 of them were assertion failures at LHS.BitWidth == RHS.BitWidth && "Bit widths must be the same")

This was tested with --release --debug-swift --debug-swift-stdlib --build-subdir=ds --test -- --skip-ios --skip-tvos --skip-watchos. I'm now trying again with just build-script -t.

lilyball · 2015-12-31T19:00:58Z

@dabrahams I confirmed, the test failures happen on master. So this commit does not introduce any new failures.

dabrahams · 2015-12-31T21:10:08Z

One final request, sorry to say: when we make a change for optimization purposes we should introduce a comment that explains why the code is the way it is, so some future maintainer doesn’t assume they can simplify it away to one line.

I can imagine cases where the isEmpty check could be an improvement (e.g. if LazyFilterCollection’s isEmpty first checked isEmpty on its underlying collection), which makes changing it back not unrealistic.

Thanks for your patient revising…

lilyball · 2015-12-31T21:26:12Z

Changing it back in that case still isn't realistic because LazyFilterCollection.isEmpty would still have to access startIndex in the event that the underlying collection isn't empty, which means isEmpty ? nil : self[startIndex] would still be accessing startIndex twice.

That said, I will go ahead and add the comment explaining why.

For lazy collections, `isEmpty` and `startIndex` may be O(N) operations. The old implementation ended up being potentially O(2N) instead of O(1). In particular, accessing `col.lazy.filter(pred).first` would evaluate the predicate on elements twice, once to determine the result of `isEmpty` and once to determine `startIndex`.

lilyball · 2015-12-31T21:37:03Z

PR updated.

dabrahams · 2015-12-31T22:06:34Z

It’s realistic if you think you want to optimize for the case where the underlying sequence of the LazyFilterCollection is empty.

lilyball · 2015-12-31T23:41:46Z

If the underlying sequence is yet another LazyFilterCollection or FlattenCollection (or any future lazy collections that don't have an O(1) isEmpty) then it would still be unnecessarily expensive to test isEmpty.

dabrahams · 2016-01-01T00:38:54Z

Whether or not it could happen, pass (performance) regression tests, and not get reverted is academic. The point is to prevent maintainers from even heading down that road mentally.

[Stdlib] Optimize CollectionType.first

lilyball · 2016-01-01T00:41:42Z

@dabrahams Ah, I see. I thought you were arguing in favor of the possibility of reverting it at some point, but I gather now that you're saying it's not unrealistic for someone else to think reverting it makes sense (without the comment saying why it's a bad idea).

gribozavr · 2016-01-02T09:09:31Z

@kballard It looks like this change could be trivially tested...

lilyball · 2016-01-02T09:12:03Z

Yeah, you're right. I was being lazy. I'll write up a test tomorrow.

lilyball · 2016-01-03T02:34:56Z

@gribozavr Test submitted as #860

This tests the optimization commited in swiftlang#825.

lilyball reviewed Dec 31, 2015
View reviewed changes

lilyball force-pushed the collectiontype-lazy-first branch from e02d50b to 6bac129 Compare December 31, 2015 18:08

lilyball force-pushed the collectiontype-lazy-first branch from 6bac129 to 6ae85ce Compare December 31, 2015 21:31

dabrahams pushed a commit that referenced this pull request Jan 1, 2016

Merge pull request #825 from kballard/collectiontype-lazy-first

e210532

[Stdlib] Optimize CollectionType.first

dabrahams merged commit e210532 into swiftlang:master Jan 1, 2016

lilyball deleted the collectiontype-lazy-first branch January 1, 2016 00:41

lilyball mentioned this pull request Jan 3, 2016

[Stdlib] Add a test for the performance of CollectionType.first #860

Merged

lilyball added a commit to lilyball/swift that referenced this pull request Jan 3, 2016

[Stdlib] Add a test for the performance of CollectionType.first

8cdaf4a

This tests the optimization commited in swiftlang#825.

[Stdlib] Optimize CollectionType.first #825

[Stdlib] Optimize CollectionType.first #825

Uh oh!

Conversation

lilyball commented Dec 31, 2015

Uh oh!

lilyball Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

lilyball Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

dabrahams Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

lilyball Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

dabrahams Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

lilyball Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

dabrahams Dec 31, 2015

Choose a reason for hiding this comment

Uh oh!

lilyball commented Dec 31, 2015

Uh oh!

lilyball commented Dec 31, 2015

Uh oh!

lilyball commented Dec 31, 2015

Uh oh!

dabrahams commented Dec 31, 2015

Uh oh!

lilyball commented Dec 31, 2015

Uh oh!

lilyball commented Dec 31, 2015

Uh oh!

dabrahams commented Dec 31, 2015

Uh oh!

lilyball commented Dec 31, 2015

Uh oh!

dabrahams commented Jan 1, 2016

Uh oh!

lilyball commented Jan 1, 2016

Uh oh!

gribozavr commented Jan 2, 2016

Uh oh!

lilyball commented Jan 2, 2016 via email

Uh oh!

lilyball commented Jan 3, 2016

Uh oh!

Uh oh!