Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: unhandled exception "missing module" in selfhosted agent startup (on AKS) #4938

Closed
1 of 4 tasks
DaanWeller opened this issue Aug 13, 2024 · 3 comments
Closed
1 of 4 tasks

Comments

@DaanWeller
Copy link

DaanWeller commented Aug 13, 2024

What happened?

Our azure pipeline agents suddenly stopped working and are now stuck in a crashloop. The strange thing is that we are running the same version of agents on three other AKS clusters which have not been experiencing the problem. In the pod logs we can see that it is because of a unhandled exception, though I'm not familiar with this and am unsure where to look further. I will include the logs.

Does anyone have any idea why this might happen? Thanks!

Versions

Agent 3.242.1
ubuntu 22.04

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

ubuntu 22.04

Version controll system

No response

Relevant log output

~$ k  -n tooling logs azure-devops-agent-XXX --previous
Getting auth token with service principal XXX
[
  {
    "cloudName": "AzureCloud",
    "id": "XXX",
    "isDefault": true,
    "name": "N/A(tenant level account)",
    "state": "Enabled",
    "tenantId": "XXX",
    "user": {
      "name": "XXX",
      "type": "servicePrincipal"
    }
  }
]
1. Determining matching Azure Pipelines agent...
2. Downloading and extracting Azure Pipelines agent...
3. Configuring Azure Pipelines agent...
Unhandled exception. System.TypeInitializationException: The type initializer for '<Module>' threw an exception.
---> System.InvalidProgramException: Common Language Runtime detected an invalid program.
   --- End of inner exception stack trace ---
   at Microsoft.VisualStudio.Services.Agent.Listener.Program.MainAsync(IHostContext context, String[] args)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.VisualStudio.Services.Agent.Listener.Program.MainAsync(IHostContext context, String[] args)
   at Microsoft.VisualStudio.Services.Agent.Listener.Program.Main(String[] args) in /mnt/vss/_work/1/s/src/Agent.Listener/Program.cs:line 30
./config.sh: line 93:    72 Aborted                 (core dumped) ./bin/Agent.Listener configure "$@"
@DenisRumyantsev
Copy link
Contributor

@DaanWeller it can be a flaky issue, have you tried to restart the AKS?

@DaanWeller
Copy link
Author

Hi @DenisRumyantsev, yes we actually just tried this out. We restarted the scaleset and no agents are able to start at all. Then we added a new "clean" node. The first pipeline would run successfully and afterwards we see the same problem. We also tried to run the same kernel versions as we discovered the cluster that has the problems was running on a slightly newer version. This does not matter as well, the agents on the upgraded cluster run fine.

@DaanWeller
Copy link
Author

DaanWeller commented Aug 15, 2024

After a week of debugging we finally found the culprit: Dynatrace. Apparently the pod injection somehow breaks the environment where the agent should run. However, it seems that the injection is not that fast, which means that the first pipeline being queued up can actually succeed. This happened after upgrading Dynatrace, version 1.294. We circumvent the problem by adding an exception rule for the agent pods. Just leaving this here in case anyone might encounter the same problem. For anyone interested in investigating this further: we think it might go wrong with the addition of the CSI driver that Dynatrace uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants