Skip to content

SqlUtils retry logic does not handle DB closed connections  #216

@michaelplavnik

Description

@michaelplavnik

SqlUtils retry logic is based on the idea that connection is live during retrying. But that is not always the case and it leads to the failure of retry loop.

Below is exception that triggered retry

Non-orchestration failure: A transient database failure occurred and will be retried. Current retry count: 0. 
Details: Microsoft.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.) 
---> System.ComponentModel.Win32Exception (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. 

at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__211_0(Task`1 result) 
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() 
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) 
--- End of stack trace from previous location 

And this is exception that caused retry to fail

TaskActivityDispatcher-20f9dc6540734ed199f29b59ce5ed253-0: Failed to fetch a work-item: System.InvalidOperationException: BeginExecuteReader requires an open and available Connection. The connection's current state is closed. 
    at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__211_0(Task`1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
    at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) 
--- End of stack trace from previous location 

--- at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) 
--- End of stack trace from previous location 

--- at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries)
    at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries)
    at DurableTask.SqlServer.SqlUtils.ExecuteSprocAndTraceAsync[T](DbCommand command, LogHelper traceHelper, String instanceId, Func`2 executor) 
    at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskActivityWorkItem(TimeSpan receiveTimeout, CancellationToken shutdownCancellationToken) 
    at DurableTask.Core.WorkItemDispatcher`1.DispatchAsync(WorkItemDispatcherContext context)

Filter for
--- at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) 
--- End of stack trace from previous location 

--- at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) 

ClientConnectionId:26deed2f-447e-4cc4-afc9-989c09fe072e Error Number:10060,State:0,Class:20.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Priority 2bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions