Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SearchValues<string> for prefix searching in RegexCompiler / source generator #96402

Merged

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jan 2, 2024

We currently use IndexOf(literal), but every call to that incurs a little overhead to determine how best to do the search. Now that we have SearchValues<string>, even though it's bread-and-butter is searching for multiple substrings, we can use it to search for a single substring, in which case it's effectively the same as IndexOf(literal) but caching the result of that examination in order to only do it once rather than on every call.

This also introduces some of the infrastructure necessary to subsequently enable multi-substring search.

Contributes to #85693

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text.RegularExpressions;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(false)]
public partial class Tests
{
    private Regex _regex;
    private string _haystack;

    [Params(true, false)]
    public bool IgnoreCase { get; set; }

    [Params("hello", "hithere")]
    public string Haystack { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _regex = new Regex(@"hello\d", RegexOptions.Compiled | (IgnoreCase ? RegexOptions.IgnoreCase : RegexOptions.None));
        _haystack = string.Concat(Enumerable.Repeat(Haystack, 1000));
    }

    [Benchmark]
    public int Count() => _regex.Count(_haystack);
}
Method Toolchain IgnoreCase Haystack Mean Ratio
Count \main\corerun.exe False hello 13,957.2 ns 1.00
Count \pr\corerun.exe False hello 12,158.0 ns 0.87
Count \main\corerun.exe False hithere 556.9 ns 1.00
Count \pr\corerun.exe False hithere 370.6 ns 0.67
Count \main\corerun.exe True hello 15,978.6 ns 1.00
Count \pr\corerun.exe True hello 12,183.9 ns 0.76
Count \main\corerun.exe True hithere 485.9 ns 1.00
Count \pr\corerun.exe True hithere 499.3 ns 1.03

…urce generator

We currently use IndexOf(literal), but every call to that incurs a little overhead to determine how best to do the search. Now that we have `SearchValues<string>`, even though it's bread-and-butter is searching for multiple substrings, we can use it to search for a single substring, in which case it's effectively the same as IndexOf(literal) but caching the result of that examination in order to only do it once rather than on every call.

This also introduces some of the infrastructure necessary to subsequently enable multi-substring search.
@ghost
Copy link

ghost commented Jan 2, 2024

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

We currently use IndexOf(literal), but every call to that incurs a little overhead to determine how best to do the search. Now that we have SearchValues<string>, even though it's bread-and-butter is searching for multiple substrings, we can use it to search for a single substring, in which case it's effectively the same as IndexOf(literal) but caching the result of that examination in order to only do it once rather than on every call.

This also introduces some of the infrastructure necessary to subsequently enable multi-substring search.

Contributes to #85693

Author: stephentoub
Assignees: stephentoub
Labels:

area-System.Text.RegularExpressions

Milestone: -

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet

This may regress some cases with longer literals until we resolve #96142

@stephentoub stephentoub merged commit ae051b7 into dotnet:main Jan 2, 2024
111 checks passed
@stephentoub stephentoub deleted the usesearchvaluesinregexforsinglestring branch January 2, 2024 18:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants