Skip to content

Connection fails and ignores FailoverPartner port after Linux SQL Server BAG failover, defaults to port 1433 #3400

Open
@AbbasLB

Description

@AbbasLB

Description

When using Microsoft.Data.SqlClient with Linux-based Basic Availability Groups (2 synchronous replicas) and specifying a custom port (e.g., 1066) in the connection string, automatic failover to the secondary server fails.

After a failover, the client attempts to connect to port 1433, ignoring the explicitly configured FailoverPartner port and actual server port.

✅ This issue does not occur with Windows-based SQL Server using named instances, even with custom ports, since the client uses the named instance to connect ot the failover (server\instance like my-failover-partner\sql64std).

Observed Behavior After Failover for this connection string with Linux host

DataSource = "tcp:my-master-server.mycompany.com,1066",
FailoverPartner = "tcp:my-failover-partner.mycompany.com,1066",
  • ❌ If a failover is performed after successfully connecting to the primary instance, the client attempts to connect to my-failover-partner without FQDN or port, defaults to port 1433, and fails.
  • ✅ If the application is restarted, it correctly connects to the failover partner using the specified port.
  • ✅ If failback occurs, connections resume correctly.

Potential Cause

From source inspection, the FailoverPartner is being overridden by the environment change ENV_LOGSHIPNODE, however it seems that it doesn't contain the port of the sql instance:
🔗 https://github.com/search?q=repo%3Adotnet%2FSqlClient+ENV_LOGSHIPNODE&type=code

Exception Message

A network-related or instance-specific error occurred while establishing a connection to SQL Server.
The server was not found or was not accessible.
(provider: TCP Provider, error: 40 - Could not open a connection to SQL Server)

To Reproduce

  1. Deploy a Basic Availability Group on SQL Server on Linux (e.g., Ubuntu 20.04) with a custom port (e.g., 1066).
  2. Use this client code:
private static async Task SimpleFailoverTest()
{
    string connectionString = new SqlConnectionStringBuilder()
    {
        InitialCatalog = "DowntimeTestDB",
        DataSource = "tcp:my-master-server.mycompany.com,1066",
        FailoverPartner = "tcp:my-failover-partner.mycompany.com,1066",
        IntegratedSecurity = true,
        TrustServerCertificate = true,
    }.ConnectionString;

    while (true)
    {
        try
        {
            using var conn = new SqlConnection(connectionString);
            await conn.OpenAsync();
            var cmd = new SqlCommand("SELECT * from [DowntimeTestDB].[dbo].[OperationsLog]", conn);
            await cmd.ExecuteNonQueryAsync();
            Console.WriteLine($"Success");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
        await Task.Delay(1000);
    }
}
  1. Trigger a manual failover to the secondary server.
  2. Observe client fails to connect post-failover.

Expected Behavior

Client should:

  • Dynamically retrieve the actual custom port from the sql server instance (e.g., port 1066)
  • Or, provide a configurable option to prevent overriding the FailoverPartner in the connection string.
  • Or, provide a way to set the failover port explicitly somewhere without being overridden.

Additional Context

  • Wireshark confirmed: Client attempts to connect to port 1433 post-failover, ignoring custom port 1066.
  • ⚠️ Failover only works when using port 1433, which is not feasible for us.
  • I used the following snippet to setup the availability groups, I think that the instance names in this query are used by the client which is why the connection fails (my-failover-partner was used for the connection) and the availability group fails to setup with including the custom port like my-failover-partner,1066 and my-master-server,1066)
CREATE AVAILABILITY GROUP [ag_DowntimeTestDB]
WITH (
    BASIC,
    CLUSTER_TYPE = NONE,
    REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0
)
FOR DATABASE [DowntimeTestDB]
REPLICA ON 
    N'my-master-server' WITH (
        ENDPOINT_URL = N'tcp://my-master-server.mycompany.com:5022',
        FAILOVER_MODE = MANUAL,
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
    ),
    N'my-failover-partner' WITH (
        ENDPOINT_URL = N'tcp://my-failover-partner.mycompany.com:5022',
        FAILOVER_MODE = MANUAL,
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
    )
GO

Logs / Traces from SqlClient

  • Working Failover Trace: Windows SQL Server with named instance failover succeeded on non-default port 1066 (it can be seen that the named instance was used for the connection)
    trace_wsl_targets_windows_failover_success.txt
  • Failing Failover Trace: Linux SQL Server failover, client doesn’t connect to custom port 1066 (we can see that the instance name with no port was used)
    trace_wsl_targets_linux_failover_failure.txt
    Note: Focus on sc.SqlInternalConnectionTds.LoginWithFailover to see the client updating the failover and ignoring the instance custom port.

Current Workarounds (Undesirable)

  1. External port redirection from 1433 → 1066
    ⛔ Not sure how feasable and complex it is

  2. Standardize to default port 1433 across all clients and servers
    ⛔ Not feasible due to massive deployment and client base

  3. Fork and patch SqlClient to use 1066 as the default
    ⛔ Maintenance overhead and unclear reliability

Technical Details

  • Microsoft.Data.SqlClient version: 5.2.2, 6.0.2
  • .NET Target Framework: .NET 8.0
  • SQL Server Version:
    • Linux: SQL Server 2022 CU19 with non-default port
    • Windows: SQL Server 2022 CU11 with non-default port
  • OS Configuration:
    • ❌ Failing: WSL Linux client / Windows client → Linux SQL Server
    • ✅ Working: WSL Linux client / Windows client → Windows SQL Server (named instances)

Please advise on a potential fix or workaround that would allow failover to connect to the custom SQL Server port or provide a way to prevent overriding FailoverPartner provided in the connection string.

Metadata

Metadata

Assignees

Labels

P2Use to label moderate priority issue - impacts atleast more than 1 customer.Triage Done ✔️Issues that are triaged by dev team and are in investigation.Waiting for Customer ⏳Issues/PRs waiting for user response/action.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions