-
Notifications
You must be signed in to change notification settings - Fork 5
Unit Tests #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit Tests #9
Conversation
We probably would first need to agree on a testing framework to use and this is when some creative differences will come into play. So, I personally use XUnit for my tests, since I find it the most intuitive framework. You seem to be using MSTest if I am not mistaken - a testing framework I've never used. As a quick note, I would also rename the folder |
I don't really have much of a preference, MSTest is just what I've defaulted to use. I haven't personally used XUnit, but can't imagine it being much different, so I'll just migrate the project. Edit: With XUnit I'm getting I'll also change the name no problem. More importantly, while writing tests for string @string = new(['2', 'x', 'r', 'b', '7', '6', 'p', 'w', 'd', 'r', '6', '4', 'g', 't', 'i', '6', '5', 'c', 'm', 'r', 't', '4', '7', 'v', '3', 'y', '6', '5', 'n', 'b', 'n', 't', 'p', '4', 'r', 'w', '6', '1', 'x', 'a', 't', 'i', 'i', 'o', 'o', 'q', '4', '0', 'q', 'i', '6', 'd', 'x', 'i', '9', '6', 'd', 'w', 'e', '2', 'x', 's', 'a', '9', 't', 'g', 'p', 'o', 'k', 'o', 'm', 'e', 'p', '5', 'k', '7', 's', '3', '0', 'c', 's', 'z', 'b', '2', 'j', '5', 'b', 's', 'u', 'h', 'c', '8', 'i', 'w', 'y', 'c', '8', 'u', 'f', 'f']);
char delimiter = 'o';
ReadOnlySpan<char> charSpan = @string;
var expected = string.Join(", ", @string.Split(delimiter, 3, StringSplitOptions.None));
var actual = string.Join(", ", ToSystemEnumerable(charSpan.Split(delimiter, StringSplitOptions.None, 3)).Select(a => string.Join("", a)));
/*
Results:
string: 2xrb76pwdr64gti65cmrt47v3y65nbntp4rw61xatiiooq40qi6dxi96dwe2xsa9tgpokomep5k7s30cszb2j5bsuhc8iwyc8uff
expexted: 2xrb76pwdr64gti65cmrt47v3y65nbntp4rw61xatii, , q40qi6dxi96dwe2xsa9tgpokomep5k7s30cszb2j5bsuhc8iwyc8uff
actual: 2xrb76pwdr64gti65cmrt47v3y65nbntp4rw61xatii, , q40qi6dxi96dwe2xsa9tgp
*/ |
This is interesting, I was not aware |
For now I'll write them assuming compatibility with Edit: Found another inconcistency. If the last character in a string is the delimiter, Edit2: I've migrated the project to XUnit. Please tell me if the format is acceptable, or the changes you'd like to see. Edit4: Also, is there a particular reason your implementations of Right now if you want to change code that uses // before
var splits = myString.Split(delimiter, maxCount, options);
// after
var splits = myString.AsSpan().Split(delimiter, maxCount, options); That won't compile. You'd need to switch those arguments: // before
var splits = myString.Split(delimiter, maxCount, options);
// after
var splits = myString.AsSpan().Split(delimiter, options, maxCount); Which just seems needless... |
ab203a1
to
2a163d3
Compare
80fe61b
to
c4a2c53
Compare
The reason why empty sequences are dropped when the delimiter is at the end of the source sequence, is because of the assumption that the remaining span is empty only when it's been exhausted: public bool MoveNext()
{
ReadOnlySpan<T> span = Span;
if(span.IsEmpty) // wrong
{
return false;
}
int index = span.IndexOf(Delimiter);
if(index == -1 || index >= span.Length)
{
Span = ReadOnlySpan<T>.Empty; // 1 way it can become empty
Current = span;
return true;
}
Current = span[..index];
Span = span[(index + 1)..]; // second way it can become empty
return true;
} We probably have to store an extra flag to indicate the end: private bool done;
public bool MoveNext()
{
ReadOnlySpan<T> span = Span;
if(done)
{
return false;
}
int index = span.IndexOf(Delimiter);
if(index == -1 || index >= span.Length)
{
done = true; // only here it can become true
Current = span;
return true;
}
Current = span[..index];
Span = span[(index + 1)..]; // done will always be false here
return true;
} The name can be agreed on. Edit: I'll open a separate PR for this |
It's a bit unfortunate that those tests for |
So, A Source Generator really sounds like something to investigate in. That will be the future though. Also I unfortunately won't be available tomorrow, e.g. in the next 24 hours, but after that I'll be going through the test suite and all the other concerns. |
Another curious issue I found, is that when splitting a span consisting of delimiters only, and specifiyng the |
The tests aren't done, I've only implemented ones for the I noticed that you swapped From what I can see, as I mentioned in my comment under the commit, the default behaviour is |
Right, It is indeed. The behaviour shouldn't be that way though. I'll make the new one the default. |
About the string split issue you mentioned, the string implementation is correct. Here's an example: var src = "aaa"; // when split by 'a' we get ['', 'a', '', 'a', '', 'a', ''] 4 empty around 3 'a'-s
src.Split('a'); // ['', '', '', ''] the result of removing 'a'-s from the above array
src.Split('a', 4); // ['', '', '', '']
src.Split('a', 3); // ['', '', 'a'] where 'a' is '' + 'a' + '' (3rd empty and after)
src.Split('a', 2); // ['', 'aa''] where 'aa' is '' + 'a' + '' + 'a' + '' (second empty and after) or another example to better visualize: var src = "bababab";
src.Split('a'); // ['b', 'b', 'b', 'b']
src.Split('a', 4); // ['b', 'b', 'b', 'b']
src.Split('a', 3); // ['b', 'b', 'bab']
src.Split('a', 2); // ['b', 'babab''] |
Thinking about it some more, this assumption is also wrong when ReadOnlySpan<char> span = Span;
int index = span.IndexOf(Delimiter);
if(index == -1)
{
enumerationDone = true;
Current = span;
return true;
} Since this shortcircuits, it never checks if trimming is necessary, for example: " a b a ".AsSpan().Split('b', StringSplitOptions.TrimEntries); results in I'll rewrite the methods and include comments if you are ok with that. Edit: Here's what I ended up with for public bool MoveNext()
{
if(enumerationDone)
{
return false;
}
while(true) // if RemoveEmptyEntries options flag is set, repeat until a non-empty span is found, or the end is reached
{
int delimiterIndex = Span.IndexOf(Delimiter);
if(delimiterIndex == -1)
{
enumerationDone = true;
Current = Span;
if(TrimEntries)
{
Current = Current.Trim();
}
return !(RemoveEmptyEntries && Current.IsEmpty);
}
Current = Span[..delimiterIndex];
Span = Span[(delimiterIndex + DelimiterLength)..];
if(TrimEntries)
{
Current = Current.Trim();
}
if(RemoveEmptyEntries && Current.IsEmpty)
{
continue;
}
return true;
}
} Things that I changed:
And here's one with public bool MoveNext()
{
if(EnumerationDone)
{
return false;
}
while(true) // if RemoveEmptyEntries options flag is set, repeat until a non-empty span is found, or the end is reached
{
int delimiterIndex = Span.IndexOf(Delimiter);
if(delimiterIndex == -1 || CurrentCount == 1)
{
EnumerationDone = true;
if(delimiterIndex != -1 && RemoveEmptyEntries && CountExceedingBehaviour == CountExceedingBehaviour.AppendLastElements) // skip all empty (after trimming if necessary) entries from the left
{
do
{
ReadOnlySpan<char> beforeDelimiter = Span[..delimiterIndex];
if(TrimEntries ? beforeDelimiter.IsWhiteSpace() : beforeDelimiter.IsEmpty)
{
Span = Span[(delimiterIndex + DelimiterLength)..];
delimiterIndex = Span.IndexOf(Delimiter);
continue;
}
break;
}
while(delimiterIndex != -1);
Current = Span;
}
else
{
Current = delimiterIndex == -1 || CountExceedingBehaviour == CountExceedingBehaviour.AppendLastElements ? Span : Span[..delimiterIndex];
}
if(TrimEntries)
{
Current = Current.Trim();
}
return !(RemoveEmptyEntries && Current.IsEmpty);
}
Current = Span[..delimiterIndex];
Span = Span[(delimiterIndex + DelimiterLength)..];
if(TrimEntries)
{
Current = Current.Trim();
}
if(RemoveEmptyEntries && Current.IsEmpty)
{
continue;
}
CurrentCount--;
return true;
}
}
} The changes include the ones above, plus I moved the check for an invalid PS. |
Turns out you can have a Draft PR (didn't know :D) that can't be accidentally merged until ready. |
:D |
I run into another weird difference: var str = "abcdababcd";
Console.WriteLine(str.Split("ab", 1, StringSplitOptions.RemoveEmptyEntries)); // abcdababcd
Console.WriteLine(str.AsSpan().Split("ab", 1, StringSplitOptions.RemoveEmptyEntries)); // cdababcd
Console.WriteLine(str.Split("ab", 2, StringSplitOptions.RemoveEmptyEntries)); // cd, cd
Console.WriteLine(str.AsSpan().Split("ab", 2, StringSplitOptions.RemoveEmptyEntries)); // cd, cd What's weird is that when
But when initial
Edit: Looked at the if (count <= 1 || Length == 0)
{
// Per the method's documentation, we'll short-circuit the search for separators.
// But we still need to post-process the results based on the caller-provided flags.
return CreateSplitArrayOfThisAsSoleValue(options, count);
}
private string[] CreateSplitArrayOfThisAsSoleValue(StringSplitOptions options, int count) {
// ...
if ((options & StringSplitOptions.RemoveEmptyEntries) == 0 || candidate.Length != 0)
{
return new string[] { candidate };
}
} As you can see, they only chech if the whole string (after trimming if necessary) is empty. Where's when // ...
if (arrIndex == count - 1)
{
// The next iteration of the loop will provide the final entry into the
// results array. If needed, skip over all empty entries before that
// point.
if ((options & StringSplitOptions.RemoveEmptyEntries) != 0)
{
while (++i < numReplaces)
{
thisEntry = this.AsSpan(currIndex, sepList[i] - currIndex);
if ((options & StringSplitOptions.TrimEntries) != 0)
{
thisEntry = thisEntry.Trim();
}
if (!thisEntry.IsEmpty)
{
break; // there's useful data here
}
currIndex = sepList[i] + (lengthList.IsEmpty ? defaultLength : lengthList[i]);
}
}
break;
}
thisEntry = this.AsSpan(currIndex); // takes everything after the found non empty split In the above, when number of splits is This is inconsistent. -_- For now, I'll change tests and the I submited a question to the .NET team, let's see what they say. |
I agree, this behaviour seems oddly inconsistent. |
Whether we should try to mimick |
Wait with the benchmarking for now. I have already started writing benchmarks on a local branch, which I will push, and the |
I think I'm done with the fuzzing tests for |
Once I'm done with the unit tests for split extensions, do you want to merge this and #11 right then, or do you prefer to wait until the rest are also done? To be honest, I'm thinking about putting the rest on hold and trying to make the source generators first. I've already done a generator that copied the source of one class into another once, so I'll probably use that as the base. |
Yeah, merging them as soon as they are done is my plan. |
and fixed some others
I am almost done, I think. However, I just realized that I missed unit tests for Sequence Delimiter variants. I'll add them tomorow. |
fixed ConsecutiveDelimitersAtTheEndWithCountEqualDelimiterCountWithRemoveEmptyEntriesOptionResultInNoSpanWithDelimiter added EmptyDelimiterSpanResultsSameAsCountEqualOne
I'm done with Fuzzing and Unit Tests for Split extensions. A few things to note:
Edit: LOL, just noticed .NET 8 added ReadOnlySpan extensions for Split and SplitAny. However, our advantage is that we support both |
kept xunit.runner.visualstudio on version 2.4.5 because later versions drop support for .NET 5
merged into review-tests!! |
why change the branch, and why squash merge? I tried to make every commit message meaningful, oh well XD Other than that, looks good, except that, the SplitAny(this Span<char> source, ReadOnlySpan<char> delimiters, StringSplitOptions options, int count, ...) Running a find and replace operation of "StringSplitOptions options, int count` -> "int count, StringSplitOptions options" with "Current Project" scope fixes the problem, everything builds and runs (well, sort of...). Note: As I've stated in #11 currently there are several bugs in the split implementations (that the PR fixes), including those that result in an infinite loop! More specifically, do not run SplitSequence and SplitStringSequence tests, as the tests will never end, and RAM usage will also keep going up indefinitely eventually spilling over into your swap.
Actually, since we always turn the Span Enumerables into IEnumerables with our extension methods, we could check that the enumeration count is below some threshold, and throw an exception if it goes above that, for example: public static IEnumerable<IEnumerable<T>> ToSystemEnumerable<T>(this SpanSplitEnumerator<T> spanEnumerator, int maxCount = int.MaxValue) where T : IEquatable<T>
{
List<T[]> list = [];
foreach(ReadOnlySpan<T> element in spanEnumerator)
{
list.Add(element.ToArray());
if (list.Count >= maxCount)
{
throw new IndexOutOfRangeException($"Enumeration exceeded {maxCount}.");
}
}
return list;
} I tested it, and it fixed the problem, I'll open a new PR for that. |
Well the merge conflicts were quite huge and I didn't want to break anything while refactoring so i created an extra branch off of dev as an additional security layer. I have already swapped the parameters and have also adapted the code style on my local copy. I am just running a visual studio code cleanup as is configured on my device. |
Furthermore, I am also currently questioning whether we actually need the tests that verify the type of Span returned, as this should not be a unit test, but is part of static analysis. |
oops, sorry 😅
If you believe they are not needed, you are free to remove them. However, keep in mind that the cast of |
I have commented them out for now!! |
Also is there a reason options is not used in the following foreach loop?? [Fact]
public void WhiteSpaceCharactersAssumedWhenDelimitersSpanIsEmpty()
{
foreach(StringSplitOptions options in stringSplitOptions)
{
AssertEqual(
[['a'], ['b'], ['c'], ['d']],
"a b c d".AsSpan().SplitAny([], StringSplitOptions.None).ToSystemEnumerable(maxCount: 100)
);
}
} |
Nope, that's an oversight 😅 |
TODO: