Skip to content

Allow NodeProviderOutOfProcTaskHost to manage multiple nodes instead of one per arch#12577

Merged
surayya-MS merged 30 commits intodotnet:mainfrom
surayya-MS:out-of-proc-task-host-second-try
Oct 7, 2025
Merged

Allow NodeProviderOutOfProcTaskHost to manage multiple nodes instead of one per arch#12577
surayya-MS merged 30 commits intodotnet:mainfrom
surayya-MS:out-of-proc-task-host-second-try

Conversation

@surayya-MS
Copy link
Member

@surayya-MS surayya-MS commented Sep 29, 2025

Fixes #12552

Context

Currently, for multithreaded node we cannot force all tasks to run in task hosts by setting MSBUILDFORCEALLTASKSOUTOFPROC=1. It crashes. The problem is that NodeProviderOutOfProcTaskHost is one per process, and manages up to 4 nodes for specific architecture.

Changes Made

  1. Propagated nodeId if the in-proc nodes (only when multi-threaded mode is on) into NodeProviderOutOfProcTaskHost
  2. Used nodeId with the combination of (int)HanshakeOptions (is originally used as key for _nodeContexts), to generate a unique id for task host node

Testing

Manually tested by building this project that references other projects with Message tasks. Modified Message task to show the process id it is running on.
msbuild.exe .\MainProject.proj - m with combination of with/without -mt switch and with/without setting MSBUILDFORCEALLTASKSOUTOFPROC.

Notes

@surayya-MS surayya-MS marked this pull request as ready for review September 30, 2025 14:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the NodeProviderOutOfProcTaskHost to manage multiple task host nodes instead of being limited to one node per architecture. The change enables better support for multi-threaded builds where multiple in-process nodes need to distinguish their task hosts.

  • Introduces a new task host node ID generation system that combines scheduled node ID with architecture/runtime information
  • Updates the NodeProviderOutOfProcTaskHost to use ConcurrentDictionary and support unlimited nodes
  • Threads the scheduled node ID through the task execution pipeline from BuildRequest to TaskHostTask

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
TaskHostTask.cs Adds scheduledNodeId parameter and GenerateTaskHostNodeId method for unique task host identification
AssemblyTaskFactory.cs Passes scheduledNodeId parameter when creating TaskHostTask instances
TaskExecutionHost.cs Threads scheduledNodeId through task initialization and instantiation
BuildRequest.cs Adds ScheduledNodeId property to track which node a request is assigned to
InProcNode.cs Sets ScheduledNodeId on build requests and stores node ID
TaskBuilder.cs Passes scheduledNodeId from build request to task execution host
NodeProviderOutOfProcTaskHost.cs Replaces HandshakeOptions-based mapping with concurrent node ID mapping
NodeProviderInProc.cs Passes nodeId when creating InProcNode instances
NodeManager.cs Extracts multi-threaded mode check to local variable
TaskExecutionHost_Tests.cs Updates test calls to include scheduledNodeId parameter
AssemblyTaskFactory_Tests.cs Updates test calls to include scheduledNodeId parameter

Copy link
Member

@JanProvaznik JanProvaznik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is well executed change, please address/discuss comments. Then lgtm.

Copy link
Member

@AR-May AR-May left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall good for me. I wonder about 2 things:

  1. Can we hide using the task node id instead of handshake casted to int behind a change wave
  2. Can we make a test that would catch the initial bug - that task host for different nodes should be different?

@surayya-MS
Copy link
Member Author

surayya-MS commented Oct 1, 2025

Can we hide using the task node id instead of handshake casted to int behind a change wave

@AR-May I can make it so that task host node id is only used for multithreaded mode.

@surayya-MS
Copy link
Member Author

surayya-MS commented Oct 2, 2025

  1. Can we make a test that would catch the initial bug - that task host for different nodes should be different?

@AR-May I'll try to do that in the follow-up PR.
For now, the testing was manual. Tested on combination of - with/without -mt and with/without setting MSBUILDFORCEALLTASKSOUTOFPROC. With MSBUILDFORCEALLTASKSOUTOFPROC=1 making sure that process ids are the same for the same project

@AR-May
Copy link
Member

AR-May commented Oct 3, 2025

  1. Can we make a test that would catch the initial bug - that task host for different nodes should be different?

@AR-May I'll try to do that in the follow-up PR. For now, the testing was manual. Tested on combination of - with/without -mt and with/without setting MSBUILDFORCEALLTASKSOUTOFPROC. With MSBUILDFORCEALLTASKSOUTOFPROC=1 making sure that process ids are the same for the same project

Ok, works for me!

@JanProvaznik JanProvaznik enabled auto-merge (squash) October 6, 2025 16:11
@surayya-MS surayya-MS disabled auto-merge October 7, 2025 06:49
@surayya-MS surayya-MS enabled auto-merge (squash) October 7, 2025 06:50
@surayya-MS surayya-MS merged commit e328bd0 into dotnet:main Oct 7, 2025
9 checks passed
AR-May added a commit that referenced this pull request Oct 15, 2025
)

### Context
There seem to be a merge bug introduced in #12577. 
The scheduled node id should be passed to the `TaskHostTask` in both
instances of the task creation.

### Changes Made
1. Passed scheduled node id to the task host task constructor
2. Removed the default value in constructor to avoid such errors in
future.

### Testing
None
JanProvaznik added a commit that referenced this pull request Dec 18, 2025
Fixes #12610 

### Context

Follow-up to #12577. The taskhost node identification used bit-packing
to combine `scheduledNodeId` (8 bits) and `HandshakeOptions` (9 bits)
into a single int, limiting multithreaded builds to 256 nodes. This
approach was fragile and hard to maintain.

### Changes Made

- **New `TaskHostNodeKey` record struct** in
`CommunicationsUtilities.cs` combining `HandshakeOptions` and `NodeId`
as a clean, type-safe key
- **Updated `NodeProviderOutOfProcTaskHost`** with optimized dictionary
structure:
- `_nodeContexts` keyed by `TaskHostNodeKey` for checking if a node with
given (HandshakeOptions, scheduledNodeId) exists (acquisition logic)
  - `_nodeIdToNodeKey` for O(1) reverse lookup during node termination
- `_nodeIdToPacketFactory` and `_nodeIdToPacketHandler` keyed by int for
O(1) packet routing performance
  - `_activeNodes` tracks node IDs (int) for shutdown handling
- Added atomic `_nextNodeId` counter for unique communication node ID
generation
- **Simplified `TaskHostTask`** - removed `GenerateTaskHostNodeId`
method and associated bit-packing constants
- **Improved error handling** - Added `VerifyThrow` assertions in
`DisconnectFromHost` and `CreateNode` to verify context existence and
prevent duplicate nodes

```csharp
// Before: bit-packed int with 256-node limit
_taskHostNodeId = GenerateTaskHostNodeId(_scheduledNodeId, _requiredContext);
// (scheduledNodeId << 9) | ((int)handshakeOptions & 0x1FF)

// After: clean record struct, no limit
_taskHostNodeKey = new TaskHostNodeKey(_requiredContext, _scheduledNodeId);
```

This gives the best of both worlds:
- **Maintainability**: Type-safe Keys for managing node lifecycle and
acquisition
- **Performance**: Direct Integer lookups for the high-frequency packet
routing hot path

### Testing

- Added 7 unit tests for `TaskHostNodeKey` (equality, hash codes,
dictionary keys, large node IDs)
- All 86 existing TaskHost tests pass

### Notes

- Removes the `NODE_ID_MAX_VALUE_FOR_MULTITHREADED` (255) constraint
entirely
- `SendData(int, INodePacket)` throws `NotImplementedException` as it's
required by `INodeProvider` interface but not used for task hosts
(similar to `TaskHostNodeManager`)

<!-- START COPILOT CODING AGENT SUFFIX -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>[multithreaded] Refactor Node IDs for
taskhost</issue_title>
> <issue_description>followup to #12577
> Refactor
src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcTaskHost.cs
, HandshakeOptions to use record struct as a key to nodes to increase
maintanability and remove limitation to 256 nodes.</issue_description>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> </comments>
> 


</details>

- Fixes #12610

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: JanProvaznik <25267098+JanProvaznik@users.noreply.github.com>
Co-authored-by: Jan Provazník <janprovaznik@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow multiple nodes for NodeProviderOutOfProcTaskHost multithreaded mode

5 participants