feat(general): POC - Add fork with Pool #7376
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We originally used a Pipe per process, and each process received a fixed batch of files ahead of time. Once a process finished its batch, it wrote the results to its pipe and exited. This caused severe load imbalance: if one process happened to receive heavier files, it worked much longer than the others.
On top of that, we had a bug in how we handled the pipes, which caused processes to get stuck before actually finishing.
We switched from the pipe-based approach to a Pool, which internally works like a queue:
all files are pushed into a shared task queue, and each worker takes a small batch of files, processes them, and then immediately pulls more. This ensures dynamic load balancing - heavy files are naturally spread across workers, and no worker gets stuck with a disproportionately heavy batch.
With this change, moving from the single worker under the old pipe model to 12 workers using a Pool, we improved total of runtime around 5.9× speed-up in a specific test case of a big repo.