Skip to content

[BUG]: Running multiple self-hosted agents causes improper Windows Server shutdown #4970

Open

Description

What happened?

I have a dedicated vm server that runs multiple pipeline agents. When restarting the server, if there is more than one agent running, the agents appear to hang on shutdown and prevent a clean operating system restart. It seems as if there is some shared resource that all of the agents are waiting on that is preventing a clean shutdown.

The same setup using the 2.x agents works without issue.

What did you do?

  1. Install multiple build agents on the same windows server. On this particular server I have 4 agents configured.
  2. Start all of the agents and verify that that that are functioning correctly.
  3. Restart the window server.

What happened?

  • The windows server hangs for a short time on shutdown. The hang is roughly 20-30 seconds.
  • After reboot there is a message in the event log on restart indicating that the prior shutdown was unexpected such as Event ID 6008 - The previous system shutdown at 10:07:52 AM on ‎8/‎29/‎2024 was unexpected.
  • When logging into the server after reboot there is a dialog (Shutdown event tracker) asking for details about the unexpected shutdown (Why did the computer shut down unexpectedly).

What did you expect to happen?
All of the agents should shutdown cleanly when the server restarts.

Other scenarios

  • The problem does not happen when no agents are running.
  • The problem does not happen when only one agent is running. It does not matter which specific agent is running.
  • The problem will always happen when at least two agents are running. It does not matter which specific agents are running.

Other notes

  • The agents
    • function normally when they are running.
    • are configured with a unique agent name
    • are use the same agent pool
    • are running as windows services under the Network Service account.
    • and installed in their own path
    • will shut down and start properly without issue using normal windows service control methods.
    • will shutdown quickly and without delay when stopping the services individually.
  • There are no entries in the agent log files in the _diag directory that indicate any sort of issue. A sample of what appears in one of the agent logs during shutdown is included below.

Versions

3.238.0 / Windows x64
3.243.1 / Windows x64

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

Azure DevOps Server (Please specify exact version in the textbox below)

Azure DevOps Server Version (if applicable)

Azure DevOps Server 2022.2 (AzureDevopsServer_20240702.1)

Operation system

Windows Server 2019 / Version 1809 (OS Build 17763.6189)

Version controll system

Azure DevOps Server git

Relevant log output

[2024-08-29 14:07:15Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 85, session 'ac144eee-740a-4e97-93e5-aa1439cd7cef'.
[2024-08-29 14:07:37Z INFO HostContext] Agent will be shutdown for UserCancelled
[2024-08-29 14:07:37Z WARN VisualStudioServices] GET request to https://<redacted>/tfs/_apis/distributedtask/pools/9/messages has been cancelled.
[2024-08-29 14:07:37Z INFO MessageListener] Get next message has been cancelled.
[2024-08-29 14:07:37Z INFO JobDispatcher] Shutting down JobDispatcher. Make sure all WorkerDispatcher has finished.
[2024-08-29 14:07:37Z INFO AgentProcess] Agent execution been cancelled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions