You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our azure pipeline agents suddenly stopped working and are now stuck in a crashloop. The strange thing is that we are running the same version of agents on three other AKS clusters which have not been experiencing the problem. In the pod logs we can see that it is because of a unhandled exception, though I'm not familiar with this and am unsure where to look further. I will include the logs.
Does anyone have any idea why this might happen? Thanks!
Versions
Agent 3.242.1
ubuntu 22.04
Environment type (Please select at least one enviroment where you face this issue)
Self-Hosted
Microsoft Hosted
VMSS Pool
Container
Azure DevOps Server type
dev.azure.com (formerly visualstudio.com)
Azure DevOps Server Version (if applicable)
No response
Operation system
ubuntu 22.04
Version controll system
No response
Relevant log output
~$ k -n tooling logs azure-devops-agent-XXX --previous
Getting auth token with service principal XXX
[
{
"cloudName": "AzureCloud",
"id": "XXX",
"isDefault": true,
"name": "N/A(tenant level account)",
"state": "Enabled",
"tenantId": "XXX",
"user": {
"name": "XXX",
"type": "servicePrincipal"
}
}
]
1. Determining matching Azure Pipelines agent...
2. Downloading and extracting Azure Pipelines agent...
3. Configuring Azure Pipelines agent...
Unhandled exception. System.TypeInitializationException: The type initializer for'<Module>' threw an exception.
---> System.InvalidProgramException: Common Language Runtime detected an invalid program.
--- End of inner exception stack trace ---
at Microsoft.VisualStudio.Services.Agent.Listener.Program.MainAsync(IHostContext context, String[] args)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
at Microsoft.VisualStudio.Services.Agent.Listener.Program.MainAsync(IHostContext context, String[] args)
at Microsoft.VisualStudio.Services.Agent.Listener.Program.Main(String[] args) in /mnt/vss/_work/1/s/src/Agent.Listener/Program.cs:line 30
./config.sh: line 93: 72 Aborted (core dumped) ./bin/Agent.Listener configure "$@"
The text was updated successfully, but these errors were encountered:
Hi @DenisRumyantsev, yes we actually just tried this out. We restarted the scaleset and no agents are able to start at all. Then we added a new "clean" node. The first pipeline would run successfully and afterwards we see the same problem. We also tried to run the same kernel versions as we discovered the cluster that has the problems was running on a slightly newer version. This does not matter as well, the agents on the upgraded cluster run fine.
After a week of debugging we finally found the culprit: Dynatrace. Apparently the pod injection somehow breaks the environment where the agent should run. However, it seems that the injection is not that fast, which means that the first pipeline being queued up can actually succeed. This happened after upgrading Dynatrace, version 1.294. We circumvent the problem by adding an exception rule for the agent pods. Just leaving this here in case anyone might encounter the same problem. For anyone interested in investigating this further: we think it might go wrong with the addition of the CSI driver that Dynatrace uses.
What happened?
Our azure pipeline agents suddenly stopped working and are now stuck in a crashloop. The strange thing is that we are running the same version of agents on three other AKS clusters which have not been experiencing the problem. In the pod logs we can see that it is because of a unhandled exception, though I'm not familiar with this and am unsure where to look further. I will include the logs.
Does anyone have any idea why this might happen? Thanks!
Versions
Agent 3.242.1
ubuntu 22.04
Environment type (Please select at least one enviroment where you face this issue)
Azure DevOps Server type
dev.azure.com (formerly visualstudio.com)
Azure DevOps Server Version (if applicable)
No response
Operation system
ubuntu 22.04
Version controll system
No response
Relevant log output
The text was updated successfully, but these errors were encountered: