Description
Background and motivation
#934 has been a long-standing issue about Split support for spans. It's evolved into an enumerator that wraps IndexOf and Slice into a slightly tidier package. While that might still be useful, it doesn't make many of the existing use cases for Split very simple, in particular ones where the consumer knows how many split values are expected, wants to extract the Nth value, etc.
Either instead of or in addition to (if we also want an enumerator syntax), we can offer a SplitAsRanges method that operates over ReadOnlySpan<T>
in a way and stores the resulting ranges into a provided location, that then also works with Span<T>
, {ReadOnly}Memory<T>
, and String
, that doesn't allocate, that automates the retrieval of N values, etc.
API Proposal
namespace System;
public static class MemoryExtensions
{
+ public static int SplitAsRanges(this ReadOnlySpan<char> source, Span<Range> destination, char separator, StringSplitOptions options = StringSplitOptions.None);
+ public static int SplitAsRanges(this ReadOnlySpan<char> source, Span<Range> destination, ReadOnlySpan<char> separator, StringSplitOptions options = StringSplitOptions.None);
+ public static int SplitAnyAsRanges(this ReadOnlySpan<char> source, Span<Range> destination, ReadOnlySpan<char> separators, StringSplitOptions options = StringSplitOptions.None);
+ public static int SplitAnyAsRanges(this ReadOnlySpan<char> source, Span<Range> destination, ReadOnlySpan<string> separators, StringSplitOptions options = StringSplitOptions.None);
}
- Naming: we could just call these
Spit{Any}
, and have the "ranges" aspect of it be implicit in taking aSpan<Range>
parameter. - Argument ordering:
destination, separator
vsseparator, destination
? I went withdestination, separator
so that the configuration-related data (separator and options) are next to each other, but that then does differ from string.Split, where the separator is first. - The methods all return the number of
System.Range
values written into destination. Use that wants to retrieve N segments regardless of whether there are more can stackalloc a span / allocate an array of NRange
instance. Use that wants to retrieve N segments and guarantee there are no more than that can stackalloc a span / allocate an array of N+1Range
instances, and validate that the returned count was N. System.Range
isunmanaged
and can be stackalloc'd.- The stored
Range
instances can be used to slice the original span/memory/string etc. to extract only those values that are needed, in either an allocating or non-allocating manner.
API Usage
Examples...
Instead of:
string[] dependentParts = dependent.Split(',');
if (dependentParts.Length != 3)
{
Log?.LogMessage($"Skipping dependent: {dependent}");
continue;
}
try
{
SdkFeatureBand dependentFeatureBand = new SdkFeatureBand(dependentParts[1]);
this code could be:
Span<Range> dependentParts = stackalloc Range[4];
...
if (dependent.AsSpan().SplitAsRanges(dependentParts, ',') != 3)
{
Log?.LogMessage($"Skipping dependent: {dependent}");
continue;
}
try
{
SdkFeatureBand dependentFeatureBand = new SdkFeatureBand(dependent[dependentParts[1]]);
Instead of:
var strLatLong = Uri.Substring(4).Split(',');
if (strLatLong.Length != 2)
{
throw new ArgumentException($"Record is not a valid {nameof(GeoRecord)}, can't find a proper latitude and longitude in the payload");
}
try
{
_latitude = Convert.ToDouble(strLatLong[0], CultureInfo.InvariantCulture);
_longitude = Convert.ToDouble(strLatLong[1], CultureInfo.InvariantCulture);
}
catch (Exception ex) when (ex is FormatException || ex is OverflowException)
{
throw new ArgumentException($"Record is not a valid {nameof(GeoRecord)}, can't find a proper latitude and longitude in the payload");
}
this could be:
Span<Range> strLatLong = stackalloc Range[3];
ReadOnlySpan<char> span = Uri.AsSpan(4);
if (span.Split(strLatLong, ',') != 2)
{
throw new ArgumentException($"Record is not a valid {nameof(GeoRecord)}, can't find a proper latitude and longitude in the payload");
}
try
{
_latitude = double.Parse(span[strLatLong[0]], provider: CultureInfo.InvariantCulture);
_longitude = double.Parse(span[strLatLong[1]], provider: CultureInfo.InvariantCulture);
}
catch (Exception ex) when (ex is FormatException || ex is OverflowException)
{
throw new ArgumentException($"Record is not a valid {nameof(GeoRecord)}, can't find a proper latitude and longitude in the payload");
}
Instead of:
while ((zoneTabFileLine = sr.ReadLine()) != null)
{
if (!string.IsNullOrEmpty(zoneTabFileLine) && zoneTabFileLine[0] != '#')
{
// the format of the line is "country-code \t coordinates \t TimeZone Id \t comments"
int firstTabIndex = zoneTabFileLine.IndexOf('\t');
if (firstTabIndex >= 0)
{
int secondTabIndex = zoneTabFileLine.IndexOf('\t', firstTabIndex + 1);
if (secondTabIndex >= 0)
{
string timeZoneId;
int startIndex = secondTabIndex + 1;
int thirdTabIndex = zoneTabFileLine.IndexOf('\t', startIndex);
if (thirdTabIndex >= 0)
{
int length = thirdTabIndex - startIndex;
timeZoneId = zoneTabFileLine.Substring(startIndex, length);
}
else
{
timeZoneId = zoneTabFileLine.Substring(startIndex);
}
if (!string.IsNullOrEmpty(timeZoneId))
{
timeZoneIds.Add(timeZoneId);
}
}
}
}
}
this could be:
Span<Range> ranges = stackalloc Range[4];
while ((zoneTabFileLine = sr.ReadLine()) != null)
{
if (zoneTabFileLine.StartsWith('#'))
{
// the format of the line is "country-code \t coordinates \t TimeZone Id \t comments"
int found = zoneTabFileLine.SplitAsRanges(ranges, '\t');
if (found >= 3)
{
timeZoneId = zoneTabFileLine[ranges[3]];
if (timeZoneId.Length != 0)
{
timeZoneIds.Add(timeZoneId);
}
}
}
}
Alternative Designs
Risks
No response