Description
Background and motivation
Earlier in .NET 8, we added an IndexOfAnyValues<T>
type that represents an immutable set of values optimized for efficient {Last}IndexOfAny{Except}
searching: #68328 (comment).
You obtain an instance of IndexOfAnyValues
by passing the full set of values to Create
, which picks the most efficient algorithm for that set of values and capabilities of the current platform.
We've already exposed Create
methods for searching for any byte
or char
in a given set.
This proposal would allow you to create IndexOfAnyValues<string>
instances for searching for multiple substrings in a given text, instead of just individual characters.
Like with bytes and chars, Create
for IndexOfAnyValues<string>
can analyze the values in advance and pick the most optimal algorithm (e.g., Aho-Corasick, Rabin-Karp, Teddy, ...).
As we're working with string
s, we're proposing adding a StringComparison
parameter to Create
.
This method would only work for Ordinal/OrdinalIgnoreCase
. It's unlikely that we can meaningfully accelerate culture-aware multi-substring searching, and the proposed APIs assume that matches are of equal lengths.
Semantically, IndexOfAny
would behave the same as doing an IndexOf
of every value, and taking the minimum index.
While a *Last*IndexOfAny
variant is possible, we are currently only proposing the left-to-right IndexOfAny
.
Regex
is likely to be the main consumer of this API.
cc: @stephentoub
cc: @tarekgh, @GrabYourPitchforks as we're working with string
s
API Proposal
namespace System.Buffers;
public static class IndexOfAnyValues
{
// Existing
public static IndexOfAnyValues<byte> Create(ReadOnlySpan<byte> values);
public static IndexOfAnyValues<char> Create(ReadOnlySpan<char> values);
// Proposed
public static IndexOfAnyValues<string> Create(ReadOnlySpan<string> values, StringComparison comparisonType);
}
namespace System;
public static class MemoryExtensions
{
// Existing
public static int IndexOfAny<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int IndexOfAnyExcept<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int LastIndexOfAny<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int LastIndexOfAnyExcept<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
// Proposed
public static int IndexOfAny(this ReadOnlySpan<char> span, IndexOfAnyValues<string> values);
}
API Usage
private static readonly IndexOfAnyValues<string> s_names = IndexOfAnyValues.Create(
new[] { "Sherlock", "Holmes", "Watson" }, StringComparison.Ordinal);
public static int CountNames(ReadOnlySpan<char> text)
{
int count = 0;
while (!text.IsEmpty)
{
int matchOffset = text.IndexOfAny(s_names);
if ((uint)matchOffset >= (uint)text.Length) break;
int matchLength = text[matchOffset] == 'S' ? 8 : 6;
text = text.Slice(matchOffset + matchLength);
count++;
}
return count;
}
Alternative Designs
Emphasize that only Ordinal{IgnoreCase}
works:
public static IndexOfAnyValues<string> CreateOrdinal(ReadOnlySpan<string> values, StringComparison comparisonType);
// or
public static IndexOfAnyValues<string> CreateOrdinal(ReadOnlySpan<string> values, bool ignoreCase);
An API that returns both a match offset and length (instead of just the offset).
So far we've rejected this variant as it adds a policy to IndexOfAny
semantics -- are we returning the leftmost-first vs leftmost-longest match.
public static (int MatchOffset, int MatchLength) IndexOfAny(this ReadOnlySpan<char> span, IndexOfAnyValues<string> values);