-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RemoteActorRefProvider address paring, caching and resolving improvements #5273
RemoteActorRefProvider address paring, caching and resolving improvements #5273
Conversation
@@ -435,7 +435,7 @@ public Deploy LookUpRemotes(IEnumerable<string> p) | |||
|
|||
public bool HasAddress(Address address) | |||
{ | |||
return address == _local.RootPath.Address || address == RootPath.Address || Transport.Addresses.Any(a => a == address); | |||
return address == _local.RootPath.Address || Transport.Addresses.Contains(address); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you meant in your other comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would interest me to make a banchmark on it.
if the volatile fields and the removal of the AddressCache had an impact
…tanova/akka.net into perf-remote-actorref-provider
@Zetanova I'll benchmark this locally - guess I need to add a guide on how to do that to our |
Yes, would be nice, that benchmark could be executed with a one line command. Maybe hosting benchmark server to track and post automatically progress over time. If there is a perf issue, it would be then visible. |
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
[Host] : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
Job-JZZTLK : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
InvocationCount=1 UnrollFactor=1
Benchmarks with issues: |
To run that, FYI |
Looks better or only sometimes a racy startup ? |
The numbers on The Akka.Cluster.Sharding benchmark is more aggressive - doesn't benefit from happy path optimizations like caching and I/O batching. |
thx, 1% is good for it. I will try to find the current bottleneck |
Yes it is - and it's a consistent improvement. |
@Aaronontheweb I found a very dirty class and cleaned it up ... the One other thing that I discovered is that the whole ActorPath Elements are more or less never getting reused. Following example of inbound paths
would result in following instances: Total: ~19 instances they are not big objects, but the multiplier is a bit worried |
Haven't reviewed the code yet but this looks promising!
That's a 20k msg/s improvement over what we published in v1.4.25 https://github.com/akkadotnet/akka.net/releases/tag/1.4.25 Need to approve the API changes here so the full test suite can run though, in case there's correctness issues. But so far so good! |
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
[Host] : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
Job-ZSODNY : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
InvocationCount=1 UnrollFactor=1
|
It is not finished. I only suspect that the instance count of ActorPaths have a bad effect Following things need to be discussed and/or decided: The The deduplication itself has its own problems,
To find/deduplicate the instance for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising. A couple of minor suggestions here.
src/core/Akka/Actor/ActorPath.cs
Outdated
if (_depth == 0) | ||
return ImmutableArray<string>.Empty; | ||
|
||
var b = ImmutableArray.CreateBuilder<string>(_depth); | ||
b.Count = _depth; | ||
var p = this; | ||
for (var i = 0; i < _depth; i++) | ||
{ | ||
b[_depth - i - 1] = p.Name; | ||
p = p._parent; | ||
} | ||
return b.ToImmutable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, what about...
if (_depth == 0)
return ImmutableArray<string>.Empty; //Still use this because it saves alloc.
var b = new List<string>(_depth);
var p = this;
for (var i = 0; i < _depth; i++)
{
b.Add(p.Name);
p = p._parent;
}
b.Reverse();
return b;
This will save us the array alloc on .ToImmutable()
and allocation of builder. (Not counting the copy since Reverse is a non-alloc copy).
If you'd prefer to stick with ImmutableArray, consider .MoveToImmutable()
instead of .ToImmutable()
, MoveTo will pull the array out without allocating a new one and copying the contents (You'll still pay the small cost of allocating the Builder of course,) and is the one to use if you are not re-using the builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I intended to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use List<> to return a IReadOnlyList<>
Before there was an up cast to List<> in the code and this should never be,
ImmutableArray is very performant and protects against "upcasts" and changes.
src/core/Akka/Actor/ActorPath.cs
Outdated
b.Count = _depth; | ||
var p = this; | ||
for (var i = 0; i < _depth; i++) | ||
{ | ||
b[_depth - i - 1] = i > 0 ? p._name : AppendUidFragment(p._name); | ||
p = p._parent; | ||
} | ||
return b.ToImmutable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comments about .ToImmutable() vs List or MoveToImmutable().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I intended to do.
src/core/Akka/Actor/ActorPath.cs
Outdated
|
||
p.Name.CopyTo(0, buffer, offset + 1, p.Name.Length); | ||
// Concatenate segments (in reverse order) into buffer with '/' prefixes | ||
Span<char> buffer = stackalloc char[totalLength]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still worry here whether a path can be long enough that we should -not- be allocating on the stack.
This is a bigger concern on MacOS where Stack sizes can be small.
Consider:
Span<char> buffer = totalLength <256 ? stackalloc char[totalLength] : new char[totalLength]; // pick a good number here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry, completely forgot, 1024 is a good number
I will fix it and make a unit test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On windows 10MB stackallock is no issue.
The stackallock could be removed, when we have String.Create()
|
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
ActorPath_Parse | 281.162 ns | 2.0975 ns | 1.8594 ns | 0.1106 | 464 B |
ActorPath_Concat | 53.146 ns | 0.3640 ns | 0.3227 ns | 0.0268 | 112 B |
ActorPath_Equals | 5.400 ns | 0.0577 ns | 0.0511 ns | - | - |
ActorPath_ToString | 57.856 ns | 0.4927 ns | 0.4608 ns | 0.0267 | 112 B |
ActorPath_ToSerializationFormat | 62.798 ns | 0.4608 ns | 0.4310 ns | 0.0267 | 112 B |
ActorPath_ToSerializationFormatWithAddress | 64.607 ns | 0.5678 ns | 0.5311 ns | 0.0267 | 112 B |
dev
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
[Host] : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
DefaultJob : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
ActorPath_Parse | 475.89 ns | 5.596 ns | 5.234 ns | 0.1068 | 448 B |
ActorPath_Concat | 51.82 ns | 0.546 ns | 0.456 ns | 0.0268 | 112 B |
ActorPath_Equals | 19.00 ns | 0.047 ns | 0.042 ns | - | - |
ActorPath_ToString | 195.94 ns | 2.064 ns | 1.930 ns | 0.0515 | 216 B |
ActorPath_ToSerializationFormat | 221.73 ns | 0.824 ns | 0.771 ns | 0.0515 | 216 B |
ActorPath_ToSerializationFormatWithAddress | 232.70 ns | 2.254 ns | 2.108 ns | 0.0515 | 216 B |
Allocations up a tiny amount on ActorPath.Parse
, but cut in half everywhere else. ActorPath.Parse
speed improved by (eyeballing it) 40-45%. Big difference!
LruBoundedCacheBenchmarks
This PR
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
[Host] : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
Job-CVYXVJ : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
InvocationCount=10000000
Method | Mean | Error | StdDev | Allocated |
---|---|---|---|---|
ActorPathCacheHitBenchmark | 64.48 ns | 0.075 ns | 0.058 ns | - |
ActorPathCacheMissBenchmark | 69.18 ns | 0.135 ns | 0.119 ns | - |
dev
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
[Host] : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
Job-DJGDWL : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
InvocationCount=10000000
Method | Mean | Error | StdDev | Allocated |
---|---|---|---|---|
ActorPathCacheHitBenchmark | 58.17 ns | 0.166 ns | 0.155 ns | - |
ActorPathCacheMissBenchmark | 72.51 ns | 0.524 ns | 0.409 ns | - |
LRU Cache metrics are a bit of a wash by the looks of it.
Akka.Cluster.Sharding benchmarks seem about the same BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1165 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=5.0.302
[Host] : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
Job-IQESDM : .NET Core 3.1.17 (CoreCLR 4.700.21.31506, CoreFX 4.700.21.31502), X64 RyuJIT
InvocationCount=1 UnrollFactor=1
Benchmarks with issues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good but have some questions and some minor changes so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will make the view changes and look/test for Root parsing
akka://sysName/
should be equal to akka://sysName
and
TryParse(rootPath, "/", out var _)
@@ -178,14 +178,14 @@ public static Task<T> Ask<T>(this ICanTell self, Func<IActorRef, object> message | |||
/// <returns>Provider used for Ask pattern implementation</returns> | |||
internal static IActorRefProvider ResolveProvider(ICanTell self) | |||
{ | |||
if (self is ActorSelection) | |||
return ResolveProvider(self.AsInstanceOf<ActorSelection>().Anchor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use native c# tools and seal classes.
To seal a class will improve ìs
, as
and switch pattern matching
@@ -117,7 +150,7 @@ public CacheStatistics(int entries, int maxProbeDistance, double averageProbeDis | |||
/// <typeparam name="TValue">The type of value used in the cache.</typeparam> | |||
internal abstract class LruBoundedCache<TKey, TValue> where TValue : class | |||
{ | |||
protected LruBoundedCache(int capacity, int evictAgeThreshold) | |||
protected LruBoundedCache(int capacity, int evictAgeThreshold, IEqualityComparer<TKey> keyComparer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key was not tested for null and the caller can give something like StringComparer.OriginalIgnoreCase
I am trying to remove the "Compute" from the cache without rehashing and to pass a func<> as arg
the best pattern for caching is
if(!cache.TryGet(key, out var value))
{
value = new object();
cache.Set(key, value);
}
we could try to make (in an other PR)
if(!cache.TryGet(key, out var value, out var entry))
{
value = new object();
cache.Set(entry, value); //or entry.Set(Value);
}
The compute should be at the user/caller not inside the cache itself.
Sometimes the caller just has more information and can make better decisions
|
||
protected AkkaPduCodec(ActorSystem system) | ||
{ | ||
System = system; | ||
AddressCache = AddressThreadLocalCache.For(system); | ||
ActorPathCache = ActorPathThreadLocalCache.For(system); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, AddressCache is now dark
The AkkaPduCodec (better the EndPointReader) is the best place to directly instanced the ActorPathCache and use it.
When then ActorPathCache is freed from ThreadLocal and bound to an Endpoint then it would produce the best cache-hits.
The only thing what needs an API change, would be the provider.ResolveActorRefWithLocalAddress(envelopeContainer.Recipient.Path, localAddress)
call.
this needs to be extended or replaced with an Context parameter.
Something like ResolveContext
with LocalAddress and the ActorPathCache
The ActorPathBenchmarks has memory increase because the ActorPath class increased by the field int _depth |
That's fine - the results are worth the trade-off. |
Why Is this a startup issue again? |
It's probably a racy spec - I fixed a ton of these last month but there's always more... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice work @Zetanova - going to merge this into v1.4.27 |
Small changes to reduce cache entries, memory barriers and unnecessary ActorPath paring