Skip to content

Conversation

@baronfel
Copy link
Member

This is a draft PR to explore exporting some System.Diagnostics.Metrics.Meter-based counters for the operational metrics of the build. The aim would be to make it easy to collect operational data cross-platform for comparisons when trying things like scheduling algorithm changes, or to quickly in real-world scenarios how the scheduler is assigning configurations to nodes, etc.

To use:

  • ./build.sh|cmd
  • copy the built Microsoft.Build and Microsoft.Build.Framework dlls to a global SDK 10 RC2 location
    • yes, I know - this is actually important because we'll be using dotnet-counters to view the counters and like all framework-dependent tools it gets confused when using bootstrap/local SDK installs
  • install dotnet-counters - dotnet tool install -g dotnet-counters
  • run a build of something using your hacked RC2 install, instrumenting it with the counters: dotnet-counters monitor --refresh-interval 1 --counters Microsoft.Build -- dotnet build rest_of_args

You'll get output like this:
image

This method makes use of the dotnet diagnostics pipe - so when spawning processes we need to make sure that other dotnet processes don't inherit the diagnostics behaviors. For this reason I injected the "disable diagnostics" env var into node creation and TaskHost environments. Without this any process spawning will hang.

@baronfel baronfel requested a review from a team as a code owner October 24, 2025 23:35
Copilot AI review requested due to automatic review settings October 24, 2025 23:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds diagnostic metrics support to MSBuild using System.Diagnostics.Metrics.Meter, enabling real-time monitoring of build operations through dotnet-counters. The implementation tracks scheduler states, node counts, and configuration assignments to help analyze build performance and scheduling behavior.

Key Changes:

  • Added System.Diagnostics.Metrics-based gauges throughout the scheduler, node manager, and configuration cache to expose operational metrics
  • Set DOTNET_EnableDiagnostics=0 environment variable for spawned processes to prevent diagnostic port conflicts that cause hangs
  • Updated bootstrap SDK version to RC2

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Shared/CommunicationsUtilities.cs Added DOTNET_EnableDiagnostics=0 to environment variables for spawned processes
src/Build/BackEnd/Components/Scheduler/SchedulingData.cs Added observable gauges for tracking request counts by state and node configurations
src/Build/BackEnd/Components/Scheduler/Scheduler.cs Added observable gauges for node counts and total request count
src/Build/BackEnd/Components/Communications/NodeManager.cs Added gauge metric for active node counts with improved disposal pattern
src/Build/BackEnd/Components/Communications/NodeLauncher.cs Set DOTNET_EnableDiagnostics environment variable in process startup
src/Build/BackEnd/Components/Caching/ConfigCache.cs Added observable gauge for configurations per project
eng/Versions.props Updated bootstrap SDK version from RC1 to RC2
Comments suppressed due to low confidence (1)

src/Build/BackEnd/Components/Caching/ConfigCache.cs:1

  • The CreateObservableGauge call creates a gauge but doesn't assign it to the _configurationsPerProjectGauge field. This means the gauge instance is created but not referenced, which could lead to it being garbage collected and not functioning as intended. Assign the result to the field: _configurationsPerProjectGauge = _configurationMetrics.CreateObservableGauge(...)
// Licensed to the .NET Foundation under one or more agreements.

@baronfel baronfel changed the title Counters Expose operational counters Oct 24, 2025
@baronfel
Copy link
Member Author

Also, note that all of this data is only for the central node - the process that the user is actually communicating with. The child worker nodes do not send counters back in this way (yet?). That's mostly why the counters I've made so far are focused on scheduling - since that happens on the central node. This makes these counters mostly-useless for multiproc mode, though they could be useful for multiarch mode.

// * start reading events from that session and forwarding them to our own event source
// kick the runtime to start the child node
// now process the events _somehow_
client.ResumeRuntime();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node spawning seems to be stuck still - I wonder if this bit is setup up correctly?

@baronfel baronfel marked this pull request as draft October 27, 2025 14:36

private static string CreateTransportName(int nodeId)
{
string transportName = $"msbuild-worker-{nodeId}-for-{Process.GetCurrentProcess().Id}-{DateTime.Now:yyyyMMdd_HHmmss}.socket";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick: We somewhat recently added EnvironmentUtilities.CurrentProcessId to avoid needing to call Process.GetCurrentProcess().Id.

The advantages are that you avoid the Process object allocation and don't need to call dispose on the object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely something that would need to get cleared up before merge - IIRC that's only on netcore TFMs right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-framework .NET added Environment.ProcessId (the helper I referenced uses that when possible). Using the pattern above has the same issues in both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants