Skip to content

Network Device Interferes with Always On Connections

Malcolm Stewart edited this page Aug 3, 2021 · 12 revisions

Network Device Interferes with Always-On Connections

The Players

IP Address Computer Role Listener Role
172.26.25.102 Client in Datacenter A
172.26.26.71 Always-On Listener IP Address in Data Center A Primary
172.26.121.194 Always-On Listener IP Address in Data Center B Secondary
172.26.6.54 Another Always-On Listener IP Address in Data Center B Secondary

Symptom

Several SSIS Jobs copy data to various Always-On clusters by connecting to the Listener name. Primary and secondary servers existed in separate data centers on separate subnets. Each Listener name had 2 IP addresses associated with it. The connection string used the MultiSubnetFailover=true keyword to connect to each IP address in parallel to optimize the connection speed.

After one weekend, the jobs started failing about 50% of the time with the following error message:

Client unable to establish connection because an error was encountered during handshakes before login. Common causes include client attempting to connect to an unsupported version of SQL Server, server too busy to accept new connections or a resource limitation (memory or maximum allowed connections) on the server.

Restarting the job would normally allow the job to complete.

As a temporary workaround, the Connection Managers in the jobs were configured to connect directly to the Primary node computer name rather than the Listener name. This stabilized the jobs but there would be issues if a cluster needed to be failed over.

Data Collection

Several network traces were taken but the failure was not readily apparent.

A driver BID Trace was collected to see what decisions the driver made during the failure.

BID Trace Analysis

Clone this wiki locally