Skip to content

Add 'split' support for ReadOnlySpan<char> similar to string #934

Closed
@ahsonkhan

Description

@ahsonkhan

Edited by @stephentoub on 6/26/2024:

public static class MemoryExtensions
{
    // Alternative name: EnumerateSplits, but not sure what SplitAny would be called
+   public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, T separator);
+   public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> separator);
+   public static SpanSplitEnumerator<T> SplitAny<T>(this ReadOnlySpan<T> source, params ReadOnlySpan<T> separators);

    // Optional:
+   public static SpanSplitEnumerator<char> SplitAny(this ReadOnlySpan<char> source, params ReadOnlySpan<string> separators);

+   public ref struct SpanSplitEnumerator<T>
+   {
+       public StringSplitEnumerator<T> GetEnumerator();
+       public bool MoveNext();
+       public Range Current { get; }
+   }
}

Older proposals:

partial class MemoryExtensions
{
    public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> span, T separator,
        StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

    public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
    {
        public SpanSplitEnumerator<T> GetEnumerator() { return this;  }
        public bool MoveNext();
        public Range Current { get; }
    }
}

Previously approved API Proposal

public static SpanSplitEnumerator<T> Split(this ReadOnlySpan<T> span, T seperator,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator GetEnumerator() { return this;  }
    public bool MoveNext();
    public ReadOnlySpan<T> Current { get; }
}

Split off from https://github.com/dotnet/corefx/issues/21395#issuecomment-359342832

From @Joe4evr on January 21, 2018 23:16

Can I throw in another suggestion? I'd really like to see some ability to split a ReadOnlySpan<char>. Obviously, you can't return a collection of Spans directly, but isn't that what Memory<T> is for?

// equivalent to the overloads of 'String.Split()'
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, StringSplitOptions options);

The reason for choosing IReadOnlyList<T> is to make the resulting collection indexable, just like string[]. It would be nice if the implementation is ImmutableArray<T>, but I'm not sure if that's a concern for distribution and such.


From @ahsonkhan on January 22, 2018 01:36

@Joe4evr, out of curiosity, do you have a scenario atm where these APIs would be useful? If so, can you please show the code sample?

I would replace the char[] overloads with ReadOnlySpan<char>. However, I am not sure about adding the split APIs, in general, given they have to allocate. Is there a way to avoid allocating? Also, given these are string-like APIs for span, it is strange to have an overload that takes a string[]. Maybe all these would fit better on ReadOnlyMemory instead, especially given the return type.


From @Joe4evr on January 22, 2018 08:35

My scenario is taking a relatively big string of user input and then parsing that to populate a more complex object. So rather than take the whole string at once, I'd like to parse it in pieces at a time. It'd be pretty nice if this can be facilitated by the Span/Memory<T> APIs so that this code won't have to allocate an extra 30-40 tiny strings whenever it runs.

Admittedly, I only started on this particular case earlier today, mostly to experiment and find out how much I could get out of the Span APIs at this time.

Maybe it was a bit naive of me to expect a collection like I did, but I'd at least like to see some API to deal with this scenario a little easier, because I'll probably not be the only one looking to split a span up into smaller chunks like this.


From @stephentoub on January 22, 2018 08:41

Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation).

I would instead expect a design more like an iterator implemented as a ref struct, e.g.

public ref struct CharSpanSplitter
{
    public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options);
    public bool TryMoveNext(out ReadOnlySpan<char> result);
}

cc @KrzysztofCwalina, @stephentoub, @Joe4evr

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions