-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Reduce allocations in QueryCollection #31594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hm, let me look into why ParseNew slower atm. I just made some recent changes which could have affected it. |
I changed the capacity from 10 to 5 and it caused that initial regression, reverted back and the perf looks better. Something odd going on. |
string name = queryString.Substring(scanIndex, equalIndex - scanIndex); | ||
string value = queryString.Substring(equalIndex + 1, delimiterIndex - equalIndex - 1); | ||
accumulator.Append( | ||
Uri.UnescapeDataString(name.Replace('+', ' ')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these allocations for the intermediate string be avoided?
In the worst case it's 6 allocations.
One for the substring, then one from string.Replace and another one for UnescapeDataString. And this for name and value, so 6 in total.
Ideally Uri.UnescapeDataString should have an overload for {RO}Span too, so some intermeidate allocations can be avoided.
So should the queryString be copied to a Span-buffer on which is operated?
The substring will instead be a slice of that span, avoiding one allocation.
string.Replace can be done with a helper on that span-slice (unforunately there's no Span.Replace-API / extension).
Instead of 3 allocations there's only 1 left, and with Uri taking a span then this is can go away too.
Helpers code
To replace +
with
, and assuming length won't be short enough, so no vectorization needed:
internal static class MySpanExtensions
{
public static void ReplaceInPlace(this Span<char> span, char oldChar, char newChar)
{
foreach (ref char c in span)
{
if (c == oldChar)
{
c = newChar;
}
}
}
}
Or with vectorization (x86 only): Use methode ReplacePlusWithSpaceInPlace
for best codegen.
internal static class MySpanExtensions
{
public static void ReplacePlusWithSpaceInPlace(this Span<char> span)
=> ReplaceInPlace(span, '+', ' ');
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe void ReplaceInPlace(this Span<char> span, char oldChar, char newChar)
{
nint i = 0;
nint n = (nint)(uint)span.Length;
fixed (char* ptr = span)
{
ushort* pVec = (ushort*)ptr;
if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
{
Vector128<ushort> vecOldChar = Vector128.Create((ushort)oldChar);
Vector128<ushort> vecNewChar = Vector128.Create((ushort)newChar);
do
{
Vector128<ushort> vec = Sse2.LoadVector128(pVec + i);
Vector128<ushort> mask = Sse2.CompareEqual(vec, vecOldChar);
Vector128<ushort> res = Sse41.BlendVariable(vec, vecNewChar, mask);
Sse2.Store(pVec + i, res);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (ptr[i] == oldChar)
{
ptr[i] = newChar;
}
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should the queryString be copied to a Span-buffer on which is operated?
I gave this appraoch a try.
Default
is 8012a72 from this PR.
Span-based
is gfoidl@e1d3c95.
| Method | Categories | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |------------ |---------:|--------:|--------:|------:|-------:|------:|------:|----------:|
| Default | QueryString | 762.4 ns | 8.71 ns | 8.15 ns | 1.00 | 0.1631 | - | - | 512 B |
| Span-based | QueryString | 746.6 ns | 8.00 ns | 6.24 ns | 0.98 | 0.1631 | - | - | 512 B |
| | | | | | | | | | |
| Default | SingleValue | 158.4 ns | 1.26 ns | 1.12 ns | 1.00 | 0.0968 | - | - | 304 B |
| Span-based | SingleValue | 150.0 ns | 1.73 ns | 1.53 ns | 0.95 | 0.0968 | - | - | 304 B |
| | | | | | | | | | |
| Default | WithPlus | 492.4 ns | 4.74 ns | 4.43 ns | 1.00 | 0.1984 | - | - | 624 B |
| Span-based | WithPlus | 415.2 ns | 3.35 ns | 2.97 ns | 0.84 | 0.1526 | - | - | 480 B |
So it's a bit faster, but most important when there are any +
in the query it allocates less.
benchmark code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using Microsoft.AspNetCore.Http.Features;
namespace Microsoft.AspNetCore.Http
{
[MemoryDiagnoser]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
[CategoriesColumn]
public class QueryCollectionBenchmarks
{
private string _queryString = "?key1=value1&key2=value2&key3=value3&key4=&key5=";
private string _singleValue = "?key1=value1";
private string _queryStringWithPlus = "?key1=value1+value2&key2=a+b+c+d&message=hello+asp+world";
[Benchmark(Baseline = true, Description = "Default")]
[BenchmarkCategory("QueryString")]
public object? ParseNew_Default()
{
return QueryFeature_Default.ParseNullableQueryInternal(_queryString);
}
[Benchmark(Baseline = true, Description = "Default")]
[BenchmarkCategory("SingleValue")]
public object? ParseNewSingle_Default()
{
return QueryFeature_Default.ParseNullableQueryInternal(_singleValue);
}
[Benchmark(Baseline = true, Description = "Default")]
[BenchmarkCategory("WithPlus")]
public object? ParseNewPlus_Default()
{
return QueryFeature_Default.ParseNullableQueryInternal(_queryStringWithPlus);
}
[Benchmark(Description = "Span-based")]
[BenchmarkCategory("QueryString")]
public object? ParseNew_1()
{
return QueryFeature_1.ParseNullableQueryInternal(_queryString);
}
[Benchmark(Description = "Span-based")]
[BenchmarkCategory("SingleValue")]
public object? ParseNewSingle_1()
{
return QueryFeature_1.ParseNullableQueryInternal(_singleValue);
}
[Benchmark(Description = "Span-based")]
[BenchmarkCategory("WithPlus")]
public object? ParseNewPlus_1()
{
return QueryFeature_1.ParseNullableQueryInternal(_queryStringWithPlus);
}
}
}
In the span-based version gfoidl@e1d3c95 I've use the vectorized approach in MySpanExtensions (a bad name I know). Vectorization kicks in if a segment (name or value) is >= 8 chars.
It's unsafe also. For a pure safe variant see details in comment above.
You can merge that comment and adjust some things to fit to the repo here (naming, placement of the extension method, etc.).
if (Store == null) | ||
{ | ||
return 0; | ||
} | ||
return Store.Count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Store == null) | |
{ | |
return 0; | |
} | |
return Store.Count; | |
return Store?.Count ?? 0; |
Then it can be written as expression bodied property.
if (Store == null) | ||
{ | ||
return EmptyKeys; | ||
} | ||
return Store.Keys; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Store == null) | |
{ | |
return EmptyKeys; | |
} | |
return Store.Keys; | |
return Store?.Keys ?? EmptyKeys; |
Similar.
while (scanIndex < textLength) | ||
{ | ||
int delimiterIndex = queryString.IndexOf('&', scanIndex); | ||
if (delimiterIndex == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (delimiterIndex == -1) | |
if (delimiterIndex < 0) |
This is really a micro-optimization, as the this produces test reg, reg; jge label
which the CPU will fuse, so it's actually only one instruction to be executed, as opposed to cmp reg, const; jne label
which is two instructions.
But you don't need to take it here, more or less FYI.
_queryString = "?key1=value1&key2=value2&key3=value3&key4=&key5="; | ||
_singleValue = "?key1=value1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be assigned to the fields directly. Just don't make the fields const or static readonly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a case where some +
are contained, to reflect the allocations for the string.Replace?
[Benchmark] | ||
public void ParseNew() | ||
{ | ||
var dict = QueryFeature.ParseNullableQueryInternal(_queryString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return the value (to prevent any possible optimization that discards the result)?
Do have the time to finish this, going to close and if someone wants to take over, feel free. |
I left some comments, so I'll shepherd this to end (in the next days). |
Hi @gfoidl. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context. |
Follow up to AdaptiveCapacityDictionary work.
Currently copied stuff in QueryHelpers as all other types are public. I can try avoiding duplication with some shared static internal types.