Skip to content

Removes the outer loop for CountBy #106111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions src/libraries/System.Linq/src/System/Linq/CountBy.cs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections;
using System.Collections.Generic;
using System.Runtime.InteropServices;

Expand All @@ -24,22 +25,24 @@ public static IEnumerable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(this I
return [];
}

return CountByIterator(source, keySelector, keyComparer);
return new CountByEnumerable<TSource, TKey>(source, keySelector, keyComparer);
}

private static IEnumerable<KeyValuePair<TKey, int>> CountByIterator<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? keyComparer) where TKey : notnull
private sealed class CountByEnumerable<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? keyComparer) : IEnumerable<KeyValuePair<TKey, int>> where TKey : notnull
{
using IEnumerator<TSource> enumerator = source.GetEnumerator();

if (!enumerator.MoveNext())
public IEnumerator<KeyValuePair<TKey, int>> GetEnumerator()
{
yield break;
}
using IEnumerator<TSource> enumerator = source.GetEnumerator();

foreach (KeyValuePair<TKey, int> countBy in BuildCountDictionary(enumerator, keySelector, keyComparer))
{
yield return countBy;
if (!enumerator.MoveNext())
{
return Empty<KeyValuePair<TKey, int>>().GetEnumerator();
}

return BuildCountDictionary(enumerator, keySelector, keyComparer).GetEnumerator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I'm finding with this approach is that this is returning an enumerator whose implementation is public. Couldn't this result users taking a dependency on that implementation details which might prevent us from evolving the method in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dictionary enumerator does not have any useful method or property, and a dictionary enumerator is hidden by the IEnumerator interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, wouldn't this cast work?

(Dictionary<T, int>.Enumerator)source.CountBy(x => x).GetEnumerator();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, wouldn't this cast work?

Yes, it will work. But Dictionary.Enumerator does not have any specific method or property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the implementation to not use dictionary would result in this cast failing for anybody who might depend on this. This is why we never return enumerable/enumerator implementations that are public unless they are specifically present in the type signature of the method (as is the case with ToList() or the inner array in Chunk()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen a code analyzer that would check this rule, I doubt it is so, why isn't there a code analyzer for this rule?

}

IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

private static Dictionary<TKey, int> BuildCountDictionary<TSource, TKey>(IEnumerator<TSource> enumerator, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? keyComparer) where TKey : notnull
Expand Down
12 changes: 6 additions & 6 deletions src/libraries/System.Linq/tests/CountByTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -32,29 +32,29 @@ public void CountBy_SourceThrowsOnGetEnumerator()
{
IEnumerable<int> source = new ThrowsOnGetEnumerator();

var enumerator = source.CountBy(x => x).GetEnumerator();
var countBy = source.CountBy(x => x);

Assert.Throws<InvalidOperationException>(() => enumerator.MoveNext());
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator().MoveNext());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you meant to assert this?

Suggested change
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator().MoveNext());
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator());

I don't think this is an acceptable behavioural change. Generally speaking GetEnumerator() calls don't do any meaningful work other than instantiate the enumerator; everything else should be reserved for the first MoveNext() call, including any nested GetEnumerator() calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think countBy.GetEnumerator().MoveNext() is independent of how the implementation is done; If the implementation changes, the test will not have to change.

Copy link
Member

@eiriktsarpalis eiriktsarpalis Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MoveNext() part is irrelevant in this case, it is countBy.GetEnumerator() that is throwing when it previously wasn't. GetEnumerator shouldn't be doing any work or throwing exceptions, however with this change it is clear that GetEnumerator is doing all the work (since it is this method that is enumerating the entire source enumerable to build the dictionary).

Copy link
Contributor Author

@AlexRadch AlexRadch Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, in the IQueryable interface it is the GetEnumerator method that does the work of initializing the query, creating an SQL query, sending the SQL query to the server, passing the query parameters, creating a cursor to obtain the result using the MoveNext method, etc. And it can also throw exceptions.

Therefore, I think that the IEnumerable.GetEnumerator method can work even before the first call to MoveNext, including throwing exceptions.

I think that the work can be done both as in the GetEnumerator method and as in the first call to MoveNext. These are already implementation details, so I wrote the tests so that they do not depend on the subtleties of the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an apples-to-apples comparison. IQueryable uses a very different execution model, especially when building SQL queries. But even if we look at how the IEnumerable implementation of built-in query provider works the same invariant holds:

IQueryable<int> queryable = new ThrowingGetEnumerator().AsQueryable()
    .Select(x => x + 1)
    .Where(x => x > 0);

IEnumerator<int> enumerator = queryable.GetEnumerator(); // No exception
enumerator.MoveNext(); // Exception

public class ThrowingGetEnumerator : IEnumerable<int>
{
    public IEnumerator<int> GetEnumerator() => throw new NotSupportedException();
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

These are already implementation details, so I wrote the tests so that they do not depend on the subtleties of the implementation.

I strongly disagree that that it's an implementation detail. Changing the behavior can break user code that depends on this method not enumerating the source or throwing exceptions; it might not be important if you just foreach over the result, but it is in code that explicitly handles enumerators in bespoke ways.

Copy link
Contributor Author

@AlexRadch AlexRadch Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this indicates an issue with the GroupBy implementation. GetEnumerator() should not be incurring side-effects in principle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess GetEnumerator can start to execute work before the first call MoveNext and it depends on implementation.

}

[Fact]
public void CountBy_SourceThrowsOnMoveNext()
{
IEnumerable<int> source = new ThrowsOnMoveNext();

var enumerator = source.CountBy(x => x).GetEnumerator();
var countBy = source.CountBy(x => x);

Assert.Throws<InvalidOperationException>(() => enumerator.MoveNext());
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator().MoveNext());
}

[Fact]
public void CountBy_SourceThrowsOnCurrent()
{
IEnumerable<int> source = new ThrowsOnCurrentEnumerator();

var enumerator = source.CountBy(x => x).GetEnumerator();
var countBy = source.CountBy(x => x);

Assert.Throws<InvalidOperationException>(() => enumerator.MoveNext());
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator().MoveNext());
}

[Fact]
Expand Down
Loading