Intermittent Connection Refused Errors in AKS with Azure S2S VPN During Scaling and High Resource Utilization #4724
Description
Cluster information:
Kubernetes version:1.29.8 (AKS)
Installation method: manually create the AKS cluster
Host OS: Azure Cloud
Node Pools: 2
Userpool: AKSAzureLinux-V2gen2-202412.04.0
Agentpool: AKSUbuntu-2204gen2containerd-2024
I have set up an AKS cluster with the following configuration. Our goal is to automate the transfer of files from one server to another. To achieve this, I have configured an Azure Site-to-Site VPN. However, when attempting to transfer files using the cluster,
We have a pod in our AKS cluster that communicates with an on-premises API. This API is used to create a session and facilitate file transfer in a specific workflow.
Azure Site-to-Site (S2S) VPN Setup:
Azure Configuration:
- Virtual Network
- Virtual Network Gateway
- Local Network Gateway
- Connection
On-Premises Configuration:
- Created a Profile in S2S
- Routing Profile
- Static Routing Profile
Issue Details:
While the VPN connection is successfully established, we occasionally encounter "Connection Refused" errors under the following scenarios:
- When the AKS cluster scales up (e.g., adding nodes or pods).
- After deploying certain updates or applications to the cluster.
- When memory or CPU utilization increases significantly.
For reference, I have included the YAML configuration below.
mcr.microsoft.com/dotnet/aspnet:8.0-alpine
is base image we used for build an image
build prod:
stage: build
tags:
- azure-vm-windows
when: manual
before_script:
- docker info
- docker-compose --version
script:
- docker-compose -f docker-compose.production.yml down
- docker-compose -f docker-compose.production.yml build --no-cache
- docker-compose -f docker-compose.production.yml up -d
deploy development:
stage: deploy dev
image: mcr.microsoft.com/azure-cli
when: manual
tags:
- azure-vm-windows
environment:
name: dev/SFT
deployment_tier: development
script:
# Login to Azure
# Login to Azure Container Registry
# tag docker images to release version
# push images to acr
# Service & Deployments to AKS
- az aks get-credentials --resource-group name --name name
# Deploy service to AKS
- kubectl apply -f development.yaml -n dev
- kubectl rollout restart deployment name -n dev
# Deploy postgres
- kubectl apply -f postgres-storage.yaml -n dev
- kubectl apply -f postgres-pvc.yaml -n dev
- kubectl apply -f postgres-configmap.yaml -n dev
- kubectl apply -f postgres-secrets.yaml -n dev
- kubectl apply -f postgres-statefulset.yaml -n dev
- kubectl apply -f postgres-svc.yaml -n dev
We're application insight logs
LoginToFileStationAsync,Failure,Message :Connection refused
Source :System.Net.Sockets
StackTrace : at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at System.Net.Sockets.Socket.g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
TargetSite :Void ThrowException(System.Net.Sockets.SocketError, System.Threading.CancellationToken)
You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.
Activity