Description
Edited by @stephentoub on 6/26/2024:
public static class MemoryExtensions
{
// Alternative name: EnumerateSplits, but not sure what SplitAny would be called
+ public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, T separator);
+ public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> separator);
+ public static SpanSplitEnumerator<T> SplitAny<T>(this ReadOnlySpan<T> source, params ReadOnlySpan<T> separators);
// Optional:
+ public static SpanSplitEnumerator<char> SplitAny(this ReadOnlySpan<char> source, params ReadOnlySpan<string> separators);
+ public ref struct SpanSplitEnumerator<T>
+ {
+ public StringSplitEnumerator<T> GetEnumerator();
+ public bool MoveNext();
+ public Range Current { get; }
+ }
}
Older proposals:
partial class MemoryExtensions
{
public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> span, T separator,
StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}
public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
public SpanSplitEnumerator<T> GetEnumerator() { return this; }
public bool MoveNext();
public Range Current { get; }
}
}
Previously approved API Proposal
public static SpanSplitEnumerator<T> Split(this ReadOnlySpan<T> span, T seperator,
StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}
public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
public SpanSplitEnumerator GetEnumerator() { return this; }
public bool MoveNext();
public ReadOnlySpan<T> Current { get; }
}
Split off from https://github.com/dotnet/corefx/issues/21395#issuecomment-359342832
From @Joe4evr on January 21, 2018 23:16
Can I throw in another suggestion? I'd really like to see some ability to split a
ReadOnlySpan<char>
. Obviously, you can't return a collection ofSpan
s directly, but isn't that whatMemory<T>
is for?
// equivalent to the overloads of 'String.Split()'
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, StringSplitOptions options);
The reason for choosing
IReadOnlyList<T>
is to make the resulting collection indexable, just likestring[]
. It would be nice if the implementation isImmutableArray<T>
, but I'm not sure if that's a concern for distribution and such.
From @ahsonkhan on January 22, 2018 01:36
@Joe4evr, out of curiosity, do you have a scenario atm where these APIs would be useful? If so, can you please show the code sample?
I would replace the char[] overloads with ReadOnlySpan<char>. However, I am not sure about adding the split APIs, in general, given they have to allocate. Is there a way to avoid allocating? Also, given these are string-like APIs for span, it is strange to have an overload that takes a string[]. Maybe all these would fit better on ReadOnlyMemory instead, especially given the return type.
From @Joe4evr on January 22, 2018 08:35
My scenario is taking a relatively big string of user input and then parsing that to populate a more complex object. So rather than take the whole string at once, I'd like to parse it in pieces at a time. It'd be pretty nice if this can be facilitated by the
Span
/Memory<T>
APIs so that this code won't have to allocate an extra 30-40 tiny strings whenever it runs.
Admittedly, I only started on this particular case earlier today, mostly to experiment and find out how much I could get out of the Span APIs at this time.
Maybe it was a bit naive of me to expect a collection like I did, but I'd at least like to see some API to deal with this scenario a little easier, because I'll probably not be the only one looking to split a span up into smaller chunks like this.
From @stephentoub on January 22, 2018 08:41
Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation).
I would instead expect a design more like an iterator implemented as a ref struct, e.g.
public ref struct CharSpanSplitter
{
public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options);
public bool TryMoveNext(out ReadOnlySpan<char> result);
}