-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check out source generation performance #7
Comments
Not directly related to src gen performance, but performance of the generated source. public global::System.Threading.Tasks.ValueTask<TResponse> Send<TResponse>(
global::Mediator.IRequest<TResponse> request,
global::System.Threading.CancellationToken cancellationToken = default
)
{
switch (request)
{
case global::MyQuery r:
{
var task = Send(r, cancellationToken);
return global::System.Runtime.CompilerServices.Unsafe.As<global::System.Threading.Tasks.ValueTask<global::MyResponse>, global::System.Threading.Tasks.ValueTask<TResponse>>(ref task);
} This scales rather badly for a lot of Requests (I tried it with a solution with 2500 Requests): There is a concept for statically typed dictionaries, similar to https://github.com/asynkron/protoactor-dotnet/blob/dev/src/Proto.Actor/Utils/TypedDictionary.cs A basic benchmark looks like this, where SwitchRequestN is the switch/case concept and StaticSwitchRequest is the static type dictionary concept. BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-10700 CPU 2.90GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.300
[Host] : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
ShortRun : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
Job=ShortRun IterationCount=3 LaunchCount=1
WarmupCount=3
As you see here, the later/deeper the Request is in the switch case the slower the call actually gets, the static version is a constant speed with a constant lookup into an array. Bottom line: consider getting rid of the large switches as we know all valid cases and with that knowledge can create perfect hashes. |
Cool approach! I didn't know switch statements scaled so poorly, but I guess it makes sense since all we have is So since we don't have the concrete T for the request, will be able to use that approach? Since in this piece public static int Switch<T>(T t) where T : class
{
var r = TypeIndex<IRequest>.Get<T>();
return r?.Value ?? 0;
}
public static TValue Get<TKey>() => s_Values[TypeKey<TKey>.Id]; So |
We need to know the concrete T for the static type dict trick to work, that is true. for different implementations. The fastest is something like this, similar to how you already use Unsafe.As: [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]
public int M3(IRequest r)
{
if (r.GetType() == typeof(D))
{
ref var d = ref System.Runtime.CompilerServices.Unsafe.As<IRequest, D>(ref r);
return d.Value;
}
if (r.GetType() == typeof(E))
{
ref var e = ref System.Runtime.CompilerServices.Unsafe.As<IRequest, E>(ref r);
return e.Value;
}
return 0;
} It is important that the GetType() is repeated each time and not stored in a variable because otherwise a lot of optimizations don't hit. AggressiveInlining is also required, I don't really know why. BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-preview.3.22167.1
[Host] : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
ShortRun : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
Job=ShortRun IterationCount=3 LaunchCount=1
WarmupCount=3
See the large difference in code size, combining that approach with the approach to split the methods by TResponse should keep the individual methods small and increase the performance a lot for large systems. Large methods always have the problem that many optimizations in the JIT stop after having e.g. 512 locals (which in our case would be 500 ifs more or less). In general it would be possible to write a minimal perfect hash algorithm which maps all request types to an [0-n] array in all cases, but unfortunately there is no static type information available which can be used for something like that, so it would need to be done at runtmie on e.g. the type guid and that will be a rather slow init process, minimal perfect hash finding is kinda like bruteforce. |
Yup, sounds like a really good solution 👍 I'm pretty busy for the next couple of days but I'll try to look into this as soon as possible |
I took your benchmark branch (with the 1000 extra handlers) and tried to implement several different concepts to optimize the switch/case for the requests, but never achieved anything worthwhile, maybe around 10% after some nasty unsafe code, nothing even remotely like the changes from the sharplap example above. Which leads me back to to the TypeDictionary, you've used that for notification in your branch. We still have the problem that we don't really have a proper compile time identifier for each request. So why not add one via source generation? public sealed partial class Request0 : IRequestIdentifier
{
int IRequestIdentifier.RequestId => 0;
} This is not the ideal solution, but probably the most performant one. |
I asked Jared Parsons for advice on twitter, https://twitter.com/torn_hoof/status/1543631166686863366 he basically confirmed that there are no constant identifiers we could use out of the box. he suggested to check for type full name first and then do the isinstance check, but that doesn't make a big difference. I'll have some minimal perfect hashing code around somewhere, time to try that one out. |
I tried out minimal perfect hashes, First wayYou take a known hash function with a seed and a displacement/intermediate map to map all inputs via the displacement map to the output, this might mean running the hash twice. See http://stevehanov.ca/blog/index.php?id=119 for a detailed explanation. The hash method needs to have a decent distribution, otherwise calculation of the displacement/intermediate table (via modifying the seed for a specific input value) takes too long. Second wayInstead of using a fixed hash function, the hash function is build based on the input data. ConclusionThe first way is too slow, but the gen code is trivial (~100 lines of code). The idea works, but writing the required code to get the second way implemented is way out of scope. |
Hey, sorry for the slow answer... I've come to the conclusion that if we want the I need to read up on the papers/articles you linked above, and I want to do some more testing/experimentation before deciding anything. |
Yeah that's my suggestion with So the preferred way is still some indexable id and I don't see any way around srcgen for that at the moment. |
NET 8 will have "Frozen" collections that optimize access based on input data (not sure if it's an implementation of perfect hashing). Would that simplify the generated code? |
Very cool, seems like there are a lot of different optimizations especially for string keys. I don't think it can compete with source generated indices, though there is the complication with the source generated indices - the running process (where Mediator is generated) can refer to multiple projects defining their own set of indices, which can overlap. So when building the running project these indices will have to be computed in some way based on the containing assembly/project so that collisions are avoided. So basically I'm not sure, I haven't done any groundwork on this yet unfortunately but we definitely should explore both and benchmark. Thanks for bringing it up! |
I recently took a long look at Source Generators and static abstract interfaces for a different project and here is my summary. Anyway, the only thing we can do is limit the possible type checks for the different The other solution, I'm actually considering (for said different project) is having different interfaces ValueTask Send<TCommand>(TCommand command,
CancellationToken cancellationToken = default) where TCommand : class, ICommand;
ValueTask<TResponse> Send<TRequest, TResponse>(TRequest request,
CancellationToken cancellationToken = default) where TRequest : class, IRequest<TResponse>; with the above code, the typed dictionary approach would work. Note: for the source generated indices to work with multiple projects, don't generate them, but use the TypedDictionary approach to generate numbers, something like this: public static class TypeIndex
{
private static int _typeIndex = -1;
private static class TypeKey<TRequest> where TRequest : class
{
// ReSharper disable once StaticMemberInGenericType
internal static readonly int Id = Interlocked.Increment(ref _typeIndex);
}
public static int GetOrAdd<TRequest>() where TRequest : class => TypeKey<TRequest>.Id;
}
public interface ITypedIndex
{
static abstract int TypeIndex { get; }
}
// Example
public sealed class Request1 : IRequest<string>, ITypedIndex
{
public string Value { get; }
public Request1(string value)
{
Value = value;
}
public static int TypeIndex => Interfaces.TypeIndex.GetOrAdd<Request1>();
} the static abstract interface only exists here as a marker interface so the static property needs to be used, in the most basic interface implementation it is not used, but could be, then one indirection via the typed dictionaries could be removed. The upside of this approach in general is that an incremental source generator will probably work better as not everything is in one source generated mediator implementation. |
Hey, thanks for the update!
Is this because
Yeah this is tempting, I even tried it out a bit in the earliest versions of this library, but opted for the better ergonomics...
Yeah this was the first problem I was anticipating. Example code looks promising, looking forward to experiment in this direction. And as you suggest I should rewrite the source generation in a way that more of the work can be done in this new source generator, making everything more incremental and less hacky. Unfortunately I haven't had as much time to spend doing open source work this last year or so as I'd like, so things have been moving slow. But I have managed to set aside a lot more time during summer, so I expect to get more done on this and other issues that have come up requiring breaking changes. So I think the 3.0 release will be pretty big. As soon as I have time to scope out the next release I'll add some information and context to the README and start work on this, which is probably gonna be when we get closer to May/June |
I think, for .NET 8+, @mgravell inadvertently solved this problem here too, with his experiments in DapperLib/Dapper#1909 He's trying to rework the dapper API for aot without large refactoring of the User's source code. His current approach uses interceptors to replace specific code snippets at compile time. These interceptors are code generated. See dotnet/csharplang#7009 for more information. The Idea is basically the following:
As far as I know, the current impl for interceptors do not need to annotate the original location, but works ad-hoc. |
Yeah it looks like it simplifies a lot. I think as long as we include the attribute in source gen, we can use older frameworks than .NET 8 as well, people just have to use the .NET 8 SDK? We support .NET Standard now but it is tempting to just say .NET 6+ as that is what is "in support" currently. If we go this route, I have the follow thoughts:
In this design we could still have the current packages (Mediator.Abstractions and Mediator.SourceGenerator), they would just be added to all projects. So in fact maybe just have 1 NuGet package which contain both the abstractions and includes the analyzer/sourcegenerator? |
Well.. still need to think about |
I think you're correct with your assumptions about simply using the .NET 8 SDK for older versions. I think the approach to generate per project should work and is a lot more incremental than doing one big type. |
I can see you're combining your Provider with a CompilationProvider. I have encountered the same issue in my Source Generator, and workarounded it by providing a custom EqualityComparer, which ignores the Compilation part. I have created a discussion in Roslyn repo, maybe someone suggests anything useful: |
Interesting! If all that runs on every keystroke you'd think you'd notice it when developing stuff. It takes about 15ms for 1 pass on one of my machines. I have been neglecting this part of the code some, though @TimothyMakkison brought some good improvements in #113 |
As far as I know, source generators are async and do not affect typing, so you'd only notice an increased CPU load. But I did check it - it does execute on every input. #113 made things a bit better that your model is cacheble now, i.e. the output of Parse method. |
See how fast source generation performance is here, if it should be improved etc.
The text was updated successfully, but these errors were encountered: