Skip to content

Deadlock when using SinglePhaseCommit with distributed transactions #1800

Closed
@roji

Description

In .NET 7.0, the OleTx/MSDTC distributed transactions support has been ported over from .NET Framework (dotnet/runtime#715). This means that it will now be possible to use distributed transactions with .NET 7.0, on Windows only.

We've received a bug report (dotnet/runtime#76010) with two separate console programs, enlisting which propagate a transaction. The initiating program (the "outer" transaction) enlists only a single SqlConnection to the transaction, and hangs when the transaction is committed. This happens quite consistently, but not 100% reliably. To reproduce this, simply run the .NET 7 version of the application in dotnet/runtime#76010 in two separate windows; the first one should hang.

When the program hangs, two threads are stuck with the following stack traces:

Stack 1

SinglePhaseEnlistment.Committed()at C:\Users\shrojans\AppData\Roaming\JetBrains\Rider2022.2\resharper-host\SourcesCache\d42a13b4dda85c7ff6fe99f383b4a1b279abd39735cb45449b7401d99c387\SinglePhaseEnlistment.cs:line 62
SqlDelegatedTransaction.SinglePhaseCommit()at C:\Users\shrojans\AppData\Roaming\JetBrains\Rider2022.2\resharper-host\SourcesCache\7c62ff4f1cfc3e61c3e65f286d919f9f0bbb2ef6a822b99cb37623d6d3a891\SqlDelegatedTransaction.cs:line 389
TransactionStateDelegatedCommitting.EnterState()
CommittableTransaction.Commit()
TransactionScope.InternalDispose()
TransactionScope.Dispose()
Program.StartTransaction()
async Program.Main()
AsyncMethodBuilderCore.Start<System.__Canon>()
Program.Main()
Program.<Main>()

Stack 2:

DbConnectionInternal.DetachTransaction()at C:\Users\shrojans\AppData\Roaming\JetBrains\Rider2022.2\resharper-host\SourcesCache\71847c97c677567f13e9df7e4f1f44a99f349762bb77c34f819752ee84b71\DbConnectionInternal.cs:line 413
DbConnectionInternal.CleanupConnectionOnTransactionCompletion()at C:\Users\shrojans\AppData\Roaming\JetBrains\Rider2022.2\resharper-host\SourcesCache\71847c97c677567f13e9df7e4f1f44a99f349762bb77c34f819752ee84b71\DbConnectionInternal.cs:line 436
DbConnectionInternal.TransactionCompletedEvent()
TransactionStatePromotedCommitted.EnterState()
InternalTransaction.DistributedTransactionOutcome()
RealOletxTransaction.FireOutcome()
OutcomeEnlistment.InvokeOutcomeFunction()
OletxTransactionManager.ShimNotificationCallback()
PortableThreadPool.CompleteWait()
ThreadPoolWorkQueue.Dispatch()
PortableThreadPool.WorkerThread.WorkerThreadStart()
[Native to Managed Transition]

Here is the general flow leading up to this state:

  1. When the TransactionScope is disposed, we get to SqlDelegatedTransaction.SinglePhaseCommit(), which is the SqlClient part that interacts with System.Transactions.
    a. SqlDelegatedTransaction.SinglePhaseCommit() locks connection (the SqlInternalConnectionTds)
    b. It then sends Commit to SQL Server. Since the transaction is delegated, the triggers the commit with MSDTC, which causes commit notifications to be sent to the enlisted parties. This includes the running program, which starts thread 2 below running concurrently.
    b. Finally, it proceeds to call enlistment.Committed(), which is SinglePhaseEnlistment.Committed(), while keeping the lock on SqlInternalConnectionTds. That method then takes _internalEnlistment.SyncRoot, which is the InternalTransaction.

  2. The above commit causes us to get a notification from the native layer (MSDTC) that the transaction committed (triggered above in 1b).
    a. We get to InternalTransaction.DistributedTransactionOutcome(), which locks the InternalTransaction.
    b. We then get to DbConnectionInternal.DetachTransaction() (DbConnectionInternal is registered as a listener on the Transaction.TransactionCompleted event), which attempts to lock this (the SqlInternalConnectionTds).

So thread 1 locks the SqlInternalConnectionTds and then the InternalTransaction, while thread 2 - which runs concurrently - locks InternalTransaction and then SqlInternalConnectionTds. This produces a deadlock.

I'm not an expert in this code, but moving the call to enlistment.Committed() (code) outside of the lock could be a way to resolve this deadlock.

Note that I couldn't reproduce the deadlock in .NET Framework; there's likely some timing differences which make the deadlock manifest far more frequently on .NET Core, but AFAICT the bug is in Framework as well. Note that this repro uses SinglePhaseCommit - only one connection is enlisted to the TransactionScope in the application. I recommend also checking having two connections to force 2PC - the same bug could be present in that flow as well.

Many thanks to @nathangreaves for providing the original repro code.

/cc @ajcvickers

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    💥 RegressionIssues that are regressions introduced from earlier PRs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions