Skip to content

Conversation

jkotalik
Copy link
Contributor

@jkotalik jkotalik commented Apr 7, 2021

Follow up to AdaptiveCapacityDictionary work.

|         Method |        Mean |     Error |      StdDev |      Median |         Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |------------:|----------:|------------:|------------:|-------------:|------:|------:|------:|----------:|
|       ParseNew | 6,990.72 ns | 693.64 ns | 2,012.36 ns | 6,400.00 ns |    143,046.7 |     - |     - |     - |     480 B |
| ParseNewSingle | 3,048.77 ns | 206.21 ns |   543.25 ns | 2,950.00 ns |    328,001.6 |     - |     - |     - |     272 B |
|          Parse | 5,448.42 ns | 532.10 ns | 1,526.70 ns | 4,700.00 ns |    183,539.4 |     - |     - |     - |     792 B |
|    Constructor |    98.91 ns |  44.44 ns |   125.35 ns |    50.00 ns | 10,109,890.1 |     - |     - |     - |      48 B |

Currently copied stuff in QueryHelpers as all other types are public. I can try avoiding duplication with some shared static internal types.

@jkotalik
Copy link
Contributor Author

jkotalik commented Apr 7, 2021

Hm, let me look into why ParseNew slower atm. I just made some recent changes which could have affected it.

@jkotalik
Copy link
Contributor Author

jkotalik commented Apr 7, 2021

|         Method |       Mean |     Error |     StdDev |     Median |        Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |-----------:|----------:|-----------:|-----------:|------------:|------:|------:|------:|----------:|
|       ParseNew | 5,430.3 ns | 350.66 ns |   971.7 ns | 5,200.0 ns |   184,150.6 |     - |     - |     - |     560 B |
| ParseNewSingle | 4,118.8 ns | 596.63 ns | 1,721.4 ns | 3,400.0 ns |   242,792.1 |     - |     - |     - |     352 B |
|          Parse | 6,026.0 ns | 592.43 ns | 1,709.3 ns | 5,550.0 ns |   165,946.4 |     - |     - |     - |     792 B |
|    Constructor |   114.9 ns |  47.93 ns |   136.7 ns |   100.0 ns | 8,703,703.7 |     - |     - |     - |      48 B |

I changed the capacity from 10 to 5 and it caused that initial regression, reverted back and the perf looks better. Something odd going on.

string name = queryString.Substring(scanIndex, equalIndex - scanIndex);
string value = queryString.Substring(equalIndex + 1, delimiterIndex - equalIndex - 1);
accumulator.Append(
Uri.UnescapeDataString(name.Replace('+', ' ')),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these allocations for the intermediate string be avoided?
In the worst case it's 6 allocations.
One for the substring, then one from string.Replace and another one for UnescapeDataString. And this for name and value, so 6 in total.

Ideally Uri.UnescapeDataString should have an overload for {RO}Span too, so some intermeidate allocations can be avoided.

So should the queryString be copied to a Span-buffer on which is operated?
The substring will instead be a slice of that span, avoiding one allocation.
string.Replace can be done with a helper on that span-slice (unforunately there's no Span.Replace-API / extension).
Instead of 3 allocations there's only 1 left, and with Uri taking a span then this is can go away too.

Helpers code

To replace + with , and assuming length won't be short enough, so no vectorization needed:

internal static class MySpanExtensions
{
    public static void ReplaceInPlace(this Span<char> span, char oldChar, char newChar)
    {
        foreach (ref char c in span)
        {
            if (c == oldChar)
            {
                c = newChar;
            }
        }
    }
}

Or with vectorization (x86 only): Use methode ReplacePlusWithSpaceInPlace for best codegen.

internal static class MySpanExtensions
{
    public static void ReplacePlusWithSpaceInPlace(this Span<char> span)
        => ReplaceInPlace(span, '+', ' ');

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static unsafe void ReplaceInPlace(this Span<char> span, char oldChar, char newChar)
    {
        nint i = 0;
        nint n = (nint)(uint)span.Length;

        fixed (char* ptr = span)
        {
            ushort* pVec = (ushort*)ptr;

            if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
            {
                Vector128<ushort> vecOldChar = Vector128.Create((ushort)oldChar);
                Vector128<ushort> vecNewChar = Vector128.Create((ushort)newChar);

                do
                {
                    Vector128<ushort> vec = Sse2.LoadVector128(pVec + i);
                    Vector128<ushort> mask = Sse2.CompareEqual(vec, vecOldChar);
                    Vector128<ushort> res = Sse41.BlendVariable(vec, vecNewChar, mask);
                    Sse2.Store(pVec + i, res);

                    i += Vector128<ushort>.Count;
                } while (i <= n - Vector128<ushort>.Count);
            }

            for (; i < n; ++i)
            {
                if (ptr[i] == oldChar)
                {
                    ptr[i] = newChar;
                }
            }
        }
    }
}

Copy link
Member

@gfoidl gfoidl Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should the queryString be copied to a Span-buffer on which is operated?

I gave this appraoch a try.
Default is 8012a72 from this PR.
Span-based is gfoidl@e1d3c95.

|     Method |  Categories |     Mean |   Error |  StdDev | Ratio |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |------------ |---------:|--------:|--------:|------:|-------:|------:|------:|----------:|
|    Default | QueryString | 762.4 ns | 8.71 ns | 8.15 ns |  1.00 | 0.1631 |     - |     - |     512 B |
| Span-based | QueryString | 746.6 ns | 8.00 ns | 6.24 ns |  0.98 | 0.1631 |     - |     - |     512 B |
|            |             |          |         |         |       |        |       |       |           |
|    Default | SingleValue | 158.4 ns | 1.26 ns | 1.12 ns |  1.00 | 0.0968 |     - |     - |     304 B |
| Span-based | SingleValue | 150.0 ns | 1.73 ns | 1.53 ns |  0.95 | 0.0968 |     - |     - |     304 B |
|            |             |          |         |         |       |        |       |       |           |
|    Default |    WithPlus | 492.4 ns | 4.74 ns | 4.43 ns |  1.00 | 0.1984 |     - |     - |     624 B |
| Span-based |    WithPlus | 415.2 ns | 3.35 ns | 2.97 ns |  0.84 | 0.1526 |     - |     - |     480 B |

So it's a bit faster, but most important when there are any + in the query it allocates less.

benchmark code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using Microsoft.AspNetCore.Http.Features;

namespace Microsoft.AspNetCore.Http
{
    [MemoryDiagnoser]
    [GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
    [CategoriesColumn]
    public class QueryCollectionBenchmarks
    {
        private string _queryString = "?key1=value1&key2=value2&key3=value3&key4=&key5=";
        private string _singleValue = "?key1=value1";
        private string _queryStringWithPlus = "?key1=value1+value2&key2=a+b+c+d&message=hello+asp+world";

        [Benchmark(Baseline = true, Description = "Default")]
        [BenchmarkCategory("QueryString")]
        public object? ParseNew_Default()
        {
            return QueryFeature_Default.ParseNullableQueryInternal(_queryString);
        }

        [Benchmark(Baseline = true, Description = "Default")]
        [BenchmarkCategory("SingleValue")]
        public object? ParseNewSingle_Default()
        {
            return QueryFeature_Default.ParseNullableQueryInternal(_singleValue);
        }

        [Benchmark(Baseline = true, Description = "Default")]
        [BenchmarkCategory("WithPlus")]
        public object? ParseNewPlus_Default()
        {
            return QueryFeature_Default.ParseNullableQueryInternal(_queryStringWithPlus);
        }

        [Benchmark(Description = "Span-based")]
        [BenchmarkCategory("QueryString")]
        public object? ParseNew_1()
        {
            return QueryFeature_1.ParseNullableQueryInternal(_queryString);
        }

        [Benchmark(Description = "Span-based")]
        [BenchmarkCategory("SingleValue")]
        public object? ParseNewSingle_1()
        {
            return QueryFeature_1.ParseNullableQueryInternal(_singleValue);
        }

        [Benchmark(Description = "Span-based")]
        [BenchmarkCategory("WithPlus")]
        public object? ParseNewPlus_1()
        {
            return QueryFeature_1.ParseNullableQueryInternal(_queryStringWithPlus);
        }
    }
}

In the span-based version gfoidl@e1d3c95 I've use the vectorized approach in MySpanExtensions (a bad name I know). Vectorization kicks in if a segment (name or value) is >= 8 chars.
It's unsafe also. For a pure safe variant see details in comment above.

You can merge that comment and adjust some things to fit to the repo here (naming, placement of the extension method, etc.).

Comment on lines +94 to +98
if (Store == null)
{
return 0;
}
return Store.Count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Store == null)
{
return 0;
}
return Store.Count;
return Store?.Count ?? 0;

Then it can be written as expression bodied property.

Comment on lines +109 to +113
if (Store == null)
{
return EmptyKeys;
}
return Store.Keys;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Store == null)
{
return EmptyKeys;
}
return Store.Keys;
return Store?.Keys ?? EmptyKeys;

Similar.

while (scanIndex < textLength)
{
int delimiterIndex = queryString.IndexOf('&', scanIndex);
if (delimiterIndex == -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (delimiterIndex == -1)
if (delimiterIndex < 0)

This is really a micro-optimization, as the this produces test reg, reg; jge label which the CPU will fuse, so it's actually only one instruction to be executed, as opposed to cmp reg, const; jne label which is two instructions.

But you don't need to take it here, more or less FYI.

Comment on lines +20 to +21
_queryString = "?key1=value1&key2=value2&key3=value3&key4=&key5=";
_singleValue = "?key1=value1";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be assigned to the fields directly. Just don't make the fields const or static readonly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a case where some + are contained, to reflect the allocations for the string.Replace?

[Benchmark]
public void ParseNew()
{
var dict = QueryFeature.ParseNullableQueryInternal(_queryString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return the value (to prevent any possible optimization that discards the result)?

@jkotalik
Copy link
Contributor Author

Do have the time to finish this, going to close and if someone wants to take over, feel free.

@jkotalik jkotalik closed this May 18, 2021
@gfoidl
Copy link
Member

gfoidl commented May 18, 2021

I left some comments, so I'll shepherd this to end (in the next days).

@ghost
Copy link

ghost commented May 18, 2021

Hi @gfoidl. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@dougbu dougbu deleted the jkotalik/dictMorePlaces branch August 21, 2021 22:33
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants