-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Expose operational counters #12695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Expose operational counters #12695
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds diagnostic metrics support to MSBuild using System.Diagnostics.Metrics.Meter, enabling real-time monitoring of build operations through dotnet-counters. The implementation tracks scheduler states, node counts, and configuration assignments to help analyze build performance and scheduling behavior.
Key Changes:
- Added System.Diagnostics.Metrics-based gauges throughout the scheduler, node manager, and configuration cache to expose operational metrics
- Set
DOTNET_EnableDiagnostics=0environment variable for spawned processes to prevent diagnostic port conflicts that cause hangs - Updated bootstrap SDK version to RC2
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Shared/CommunicationsUtilities.cs | Added DOTNET_EnableDiagnostics=0 to environment variables for spawned processes |
| src/Build/BackEnd/Components/Scheduler/SchedulingData.cs | Added observable gauges for tracking request counts by state and node configurations |
| src/Build/BackEnd/Components/Scheduler/Scheduler.cs | Added observable gauges for node counts and total request count |
| src/Build/BackEnd/Components/Communications/NodeManager.cs | Added gauge metric for active node counts with improved disposal pattern |
| src/Build/BackEnd/Components/Communications/NodeLauncher.cs | Set DOTNET_EnableDiagnostics environment variable in process startup |
| src/Build/BackEnd/Components/Caching/ConfigCache.cs | Added observable gauge for configurations per project |
| eng/Versions.props | Updated bootstrap SDK version from RC1 to RC2 |
Comments suppressed due to low confidence (1)
src/Build/BackEnd/Components/Caching/ConfigCache.cs:1
- The CreateObservableGauge call creates a gauge but doesn't assign it to the
_configurationsPerProjectGaugefield. This means the gauge instance is created but not referenced, which could lead to it being garbage collected and not functioning as intended. Assign the result to the field:_configurationsPerProjectGauge = _configurationMetrics.CreateObservableGauge(...)
// Licensed to the .NET Foundation under one or more agreements.
|
Also, note that all of this data is only for the central node - the process that the user is actually communicating with. The child worker nodes do not send counters back in this way (yet?). That's mostly why the counters I've made so far are focused on scheduling - since that happens on the central node. This makes these counters mostly-useless for multiproc mode, though they could be useful for multiarch mode. |
| // * start reading events from that session and forwarding them to our own event source | ||
| // kick the runtime to start the child node | ||
| // now process the events _somehow_ | ||
| client.ResumeRuntime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node spawning seems to be stuck still - I wonder if this bit is setup up correctly?
|
|
||
| private static string CreateTransportName(int nodeId) | ||
| { | ||
| string transportName = $"msbuild-worker-{nodeId}-for-{Process.GetCurrentProcess().Id}-{DateTime.Now:yyyyMMdd_HHmmss}.socket"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick: We somewhat recently added EnvironmentUtilities.CurrentProcessId to avoid needing to call Process.GetCurrentProcess().Id.
The advantages are that you avoid the Process object allocation and don't need to call dispose on the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely something that would need to get cleared up before merge - IIRC that's only on netcore TFMs right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-framework .NET added Environment.ProcessId (the helper I referenced uses that when possible). Using the pattern above has the same issues in both.
This is a draft PR to explore exporting some System.Diagnostics.Metrics.Meter-based counters for the operational metrics of the build. The aim would be to make it easy to collect operational data cross-platform for comparisons when trying things like scheduling algorithm changes, or to quickly in real-world scenarios how the scheduler is assigning configurations to nodes, etc.
To use:
dotnet-countersto view the counters and like all framework-dependent tools it gets confused when using bootstrap/local SDK installsdotnet tool install -g dotnet-countersdotnet-counters monitor --refresh-interval 1 --counters Microsoft.Build -- dotnet build rest_of_argsYou'll get output like this:

This method makes use of the dotnet diagnostics pipe - so when spawning processes we need to make sure that other dotnet processes don't inherit the diagnostics behaviors. For this reason I injected the "disable diagnostics" env var into node creation and TaskHost environments. Without this any process spawning will hang.