Skip to content

[Stdlib] Optimize CollectionType.first #825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 1, 2016

Conversation

lilyball
Copy link
Contributor

For lazy collections, isEmpty and startIndex may be O(N) operations.
The old implementation ended up being potentially O(2N) instead of O(1).
In particular, accessing col.lazy.filter(pred).first would evaluate
the predicate on elements twice, once to determine the result of
isEmpty and once to determine startIndex.

While we're at it, tweak the complexity as well, as lazy collections may
be O(N).

public var first: Generator.Element? {
return isEmpty ? nil : self[startIndex]
var gen = generate()
return gen.next()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative implementation looks like

let start = startIndex
return start == endIndex ? nil : self[start]

This basically just reimplements isEmpty locally without throwing away the start index. The generator approach seems slightly more straightforward though (and with my proposal to add var first to SequenceType, this is how it would be implemented there anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also very mildly concerned that, despite endIndex being documented as O(1), it's possible for some collection to implement it more expensively (although they really shouldn't do that). Conversely, it's possible for a collection to implement a generator such that the creation of the generator is expensive, but they shouldn't do that either. Since iteration of sequences (and therefore generators) is so prevalent and therefore a prime target for optimization, I'm assuming that the generator approach (as used in my PR) is more reliably efficient than the index approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the generator approach is inherently more expensive in some cases. For example, consider

(0..<20).lazy.map { veryExpensiveCall($0) }.isEmpty

I think comparing indices is the obvious way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not proposing that isEmpty be changed to use a generator. This is just the first property which, by definition, has to access the first element anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this? Just want to make sure before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I ran the regular test suite but not the validation suite. I'd check but I'm not at my computer right now. If you don't merge it tonight then I'll make sure to re-run it tomorrow just to be sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you’re not sure you ran at least one of them, I’d rather wait.

@lilyball lilyball force-pushed the collectiontype-lazy-first branch from e02d50b to 6bac129 Compare December 31, 2015 18:08
@lilyball
Copy link
Contributor Author

I just re-pushed to get rid of the complexity doc comment change, because I assume you don't actually want that given that you closed #824. I'm running tests now.

@lilyball
Copy link
Contributor Author

Tests finished. I got 3 failures, but they're unrelated to this commit.

Failing Tests (3):
    Swift-Unit :: Basic/SwiftBasicTests/Compression.FlatEncoding
    Swift-Unit :: Basic/SwiftBasicTests/Compression.FullCompression
    Swift-Unit :: Basic/SwiftBasicTests/Compression.VariableLength

(all 3 of them were assertion failures at LHS.BitWidth == RHS.BitWidth && "Bit widths must be the same")

This was tested with --release --debug-swift --debug-swift-stdlib --build-subdir=ds --test -- --skip-ios --skip-tvos --skip-watchos. I'm now trying again with just build-script -t.

@lilyball
Copy link
Contributor Author

@dabrahams I confirmed, the test failures happen on master. So this commit does not introduce any new failures.

@dabrahams
Copy link
Contributor

One final request, sorry to say: when we make a change for optimization purposes we should introduce a comment that explains why the code is the way it is, so some future maintainer doesn’t assume they can simplify it away to one line.

I can imagine cases where the isEmpty check could be an improvement (e.g. if LazyFilterCollection’s isEmpty first checked isEmpty on its underlying collection), which makes changing it back not unrealistic.

Thanks for your patient revising…

@lilyball
Copy link
Contributor Author

Changing it back in that case still isn't realistic because LazyFilterCollection.isEmpty would still have to access startIndex in the event that the underlying collection isn't empty, which means isEmpty ? nil : self[startIndex] would still be accessing startIndex twice.

That said, I will go ahead and add the comment explaining why.

For lazy collections, `isEmpty` and `startIndex` may be O(N) operations.
The old implementation ended up being potentially O(2N) instead of O(1).
In particular, accessing `col.lazy.filter(pred).first` would evaluate
the predicate on elements twice, once to determine the result of
`isEmpty` and once to determine `startIndex`.
@lilyball lilyball force-pushed the collectiontype-lazy-first branch from 6bac129 to 6ae85ce Compare December 31, 2015 21:31
@lilyball
Copy link
Contributor Author

PR updated.

@dabrahams
Copy link
Contributor

It’s realistic if you think you want to optimize for the case where the underlying sequence of the LazyFilterCollection is empty.

@lilyball
Copy link
Contributor Author

If the underlying sequence is yet another LazyFilterCollection or FlattenCollection (or any future lazy collections that don't have an O(1) isEmpty) then it would still be unnecessarily expensive to test isEmpty.

@dabrahams
Copy link
Contributor

Whether or not it could happen, pass (performance) regression tests, and not get reverted is academic. The point is to prevent maintainers from even heading down that road mentally.

dabrahams pushed a commit that referenced this pull request Jan 1, 2016
[Stdlib] Optimize CollectionType.first
@dabrahams dabrahams merged commit e210532 into swiftlang:master Jan 1, 2016
@lilyball
Copy link
Contributor Author

lilyball commented Jan 1, 2016

@dabrahams Ah, I see. I thought you were arguing in favor of the possibility of reverting it at some point, but I gather now that you're saying it's not unrealistic for someone else to think reverting it makes sense (without the comment saying why it's a bad idea).

@lilyball lilyball deleted the collectiontype-lazy-first branch January 1, 2016 00:41
@gribozavr
Copy link
Contributor

@kballard It looks like this change could be trivially tested...

@lilyball
Copy link
Contributor Author

lilyball commented Jan 2, 2016 via email

@lilyball
Copy link
Contributor Author

lilyball commented Jan 3, 2016

@gribozavr Test submitted as #860

lilyball added a commit to lilyball/swift that referenced this pull request Jan 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants