Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix async lock #4485

Merged
merged 2 commits into from
Sep 20, 2021
Merged

Fix async lock #4485

merged 2 commits into from
Sep 20, 2021

Conversation

mconnew
Copy link
Member

@mconnew mconnew commented Jan 8, 2021

Fixed a SocketConnection memory leak where we were pinning the SocketConnection and buffer in memory after the socket is closed due to not cancelling timers. NetFx doesn't actually support synchronous completion of Socket.SendAsync or Socket.ReceiveAsync so this bug wasn't discovered on NetFx. There was also a bug with not disposing the System.Threading.Timer instances at close. Switched back to IOThreadTimer that NetFx uses which is cheap to cancel.
Fixed IOThreadTimer issue where the first WaitableTimer wasn't being reset so could have caused what would effectively be an asynchronous while(true) loop.
Fixed AsyncLock to correctly use a static AsyncLocal instead of an instance. This was causing a memory leak. Adjusted logic to account for multiple AsyncLock's on the same ExecutionContext. This was the cause of #4435.
Fixes #4435
Other issues were found while investigating #4435

@mconnew
Copy link
Member Author

mconnew commented Jan 8, 2021

@stephentoub, could you look at my AsyncLock implementation and check I haven't missed anything else obvious?

Base automatically changed from master to main February 6, 2021 00:42
@mconnew
Copy link
Member Author

mconnew commented Aug 18, 2021

This PR also fixes a couple of minor issues discovered while debugging the AsyncLock where I found the ExecutionContext was being held on to by the Timer based IOTimer (which IOThreadTimer doesn't do).

@mconnew
Copy link
Member Author

mconnew commented Aug 19, 2021

@stephentoub, can you take a look at my new version?

@mconnew mconnew requested review from HongGit and stephentoub August 19, 2021 17:40
_lockTakenCallStackString = _lockTakenCallStack.ToString();
#endif
return _semaphoreRelease;
await currentSemaphore.WaitAsync(cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it by design that the semaphore which was rented from the pool doesn't get returned if this wait is canceled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't. But I just realized, am I going to have a problem here? After the await, if I put the currentSemaphore back as _currentSemaphore.Value, is that going to work? The await would have caused the EC to be captured in this method and putting the previous SemaphoreSlim back is going to cause the EC to be copied and the new copy modified. When the caller completes awaiting TakeLockAsync, it's original EC is going to be restored which has the new SemaphoreSlim set in _currentSemaphore.Value as that was done in a Task returning sync implemented method before the call to the async method. It looks like I can't support cancellation with this implementation, have I understood that correctly? Not supporting cancellation would be fine as you can't cancel a regular lock statement.

// Ensure not in use
await _topLevelSemaphore.WaitAsync();
_topLevelSemaphore.Release();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code suggests there's a concern the semaphore was potentially in use. Given that concern, what prevents it from being used by whatever that other code was after this Release call? Is this just a best effort thing, and the expectation is any use of _topLevelSemaphore after DisposeAsync has been invoked is erroneous?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more than a best effort. Calls to TakeLock and TakeLockAsync are protected by an if(_disposed) so once DisposeAsync has been called (which sets _disposed), neither of those methods can be called. This means if DisposeAsync is able to acquire the semaphore, it's guaranteed no other thread is able to. This guarantees it's safe to return it to the pool without risk of a use after release type of problem. The idea is also that if some code tries to cleanup an AsyncLock (e.g. channel is being closed) and some method still has the lock acquired, the DisposeAsync will wait for that code to release the lock before completing. So it's to ensure correctness, and to allow disposing while something is still in flight safely.

@@ -54,6 +55,8 @@
<PackageReference Include="System.Security.Principal.Windows" Version="$(SystemSecurityPrincipalWindowsPackageVersion)" />
<PackageReference Include="System.Security.Cryptography.Xml" Version="$(SystemSecurityCryptographyXmlPackageVersion)" />
<PackageReference Include="System.Numerics.Vectors" Version="$(SystemNumericsVectorsPackageVersion)" />
<PackageReference Include="Microsoft.Bcl.AsyncInterfaces" Version="5.0.0" />
<PackageReference Include="Microsoft.Extensions.ObjectPool" Version="5.0.9" />
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is possible to depend on lower versions of these packages? Due to the way the Azure Functions host works for .NET core 3.1 (in process model) it overrides any user assemblies with the 3.x version, effectively creating an upper bound of 3.x for the Microsoft.Extensions packages. This in turn creates runtime errors with FileNotFoundException when any dependencies reference an extensions package over 3.x. Packages like EF Core has the same issue - you can't use anything over 3.x on the functions in-process host.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fill everyone in on our discussion on gitter in case someone else finds this via a search engine, the solution was to add the 3.x versions of these packages to the top level project. This then causes a build warning NU1605 because a package downgrade was forced. Apparently the .NET SDK treats that warning as a build error so you need to add <NoWarn>NU1605</NoWarn> to your project.


namespace System.Runtime
{
internal class AsyncLock
internal class AsyncLock : IAsyncDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this deserves an extensive set of tests, and I don't see any in this PR. What you have here looks strictly more correct than what you previously had, though I'd feel better about it if there were tests validating all the scenarios required by its internal usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario required by it's internal usage is reliable sessions, and we have pretty extensive tests around that functionality going so far as to intercept the underlying HTTP requests to mess around with the timing and/or delivery of messages to trigger all the behaviors that ReliableSessions provides. The original implementation worked from a lock standpoint, but leaked memory.

@mconnew mconnew merged commit f2d0cbb into dotnet:main Sep 20, 2021
@mconnew mconnew deleted the FixAsyncLock branch September 20, 2021 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Continuously degrading performance in repeated one-way calls (S.SM 4.8, netcore3.1)
4 participants