Skip to content

[release/9.0] Revert "[GC] Avoid OOM in large-allocation-only workloads (#105521)"#107712

Merged
jeffschwMSFT merged 3 commits intodotnet:release/9.0from
markples:revert-dfr
Sep 13, 2024
Merged

[release/9.0] Revert "[GC] Avoid OOM in large-allocation-only workloads (#105521)"#107712
jeffschwMSFT merged 3 commits intodotnet:release/9.0from
markples:revert-dfr

Conversation

@markples
Copy link
Contributor

@markples markples commented Sep 11, 2024

Revert "[GC] Avoid OOM in large-allocation-only workloads (#105521)", which is commit b27a808.

The revert had merge conflicts with bfb674e. That commit changed the ordering of distribute_free_regions/age_free_regions with other code, so it was simpler to leave the new age_free_regions in place. The original factoring into age_free_regions also adding aging of huge regions. This change reverts that part in order to restore the original behavior.

Customer Impact

  • Customer reported
  • Found internally

#105521 was intended to be followed by #106414 and possibly another improvement, but subsequent study has shown that the existing bookkeeping for large vs. huge regions is inconsistent, and these changes can trigger some strange behavior. In various cases, it can cause

  • the amount of region distribution to change
  • an assertion to hit in debug builds (though the condition is handled in release by decommitting the regions in question, which is often, but not always, what we want
  • the selection of which specific regions to decommit to change

The solution is going to involve a bigger change to distribute_free_regions if not something else like estimate_gen_growth as well, so this goes back to the original behavior until that can happen in 10.

Regression

  • Yes
  • No

Testing

distribute_free_region was factored into a standalone C++ project in order to run it in various scenarios. Due to the issues listed above, the original behavior was chosen for .NET 9.

Risk

Low - this reverts from newly added behavior to what had been in place for months.

…otnet#105521)"

This partially reverts commit b27a808.

This is done instead of a full revert because there were merge conflicts
with bfb674e.  This also minimizes diffs with main in case of future
backports.

This reverts the distribute_free_regions but keeps the factoring.  It also
keeps the aging of huge free regions, but nothing is done with those ages.
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@markples markples added this to the 9.0.0 milestone Sep 11, 2024
…tnet#105521)"

This reverts the rest of commit b27a808
except for the age helper.
@markples markples changed the title Partially revert "[GC] Avoid OOM in large-allocation-only workloads … Revert most of "[GC] Avoid OOM in large-allocation-only workloads (#105521)" Sep 13, 2024
@markples markples requested a review from Maoni0 September 13, 2024 18:01
@markples markples changed the title Revert most of "[GC] Avoid OOM in large-allocation-only workloads (#105521)" Revert "[GC] Avoid OOM in large-allocation-only workloads (#105521)" Sep 13, 2024
@markples markples changed the title Revert "[GC] Avoid OOM in large-allocation-only workloads (#105521)" [release/9.0] Revert "[GC] Avoid OOM in large-allocation-only workloads (#105521)" Sep 13, 2024
@markples markples marked this pull request as ready for review September 13, 2024 18:13
@markples markples added the Servicing-consider Issue for next servicing release review label Sep 13, 2024
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. please get a code review. we can merge when ready

@jeffschwMSFT jeffschwMSFT added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Sep 13, 2024
@jeffschwMSFT
Copy link
Member

@markples please take a look at the PR failures

@markples
Copy link
Contributor Author

@markples please take a look at the PR failures

@jeffschwMSFT It's one known test error (and known to be on the 9.0 branch). I don't know what else build analysis looks at, but I verified that the pipeline reports a total error count of 1, has 1 failing step, and the log for that step reports just the one test.

@jeffschwMSFT jeffschwMSFT merged commit 3b622e7 into dotnet:release/9.0 Sep 13, 2024
markples added a commit to markples/runtime that referenced this pull request Sep 30, 2024
…location-only workloads (dotnet#105521)" (dotnet#107712)', which is b27a808

The 9.0 revert already handled merge conflicts with bfb674e, which is in main
and 9.0.  However, this keeps the changes to age huge free regions.  Nothing is
done with those ages, but this avoids churn to put it back in later.
markples added a commit that referenced this pull request Oct 8, 2024
…ent strategy (#108417)

* Revert "[GC] Allow distribute_free_regions to decommit and redistribute (#106414)"

This reverts commit 566a3bd.

* Cherry-pick part of '[release/9.0] Revert "[GC] Avoid OOM in large-allocation-only workloads (#105521)" (#107712)', which is b27a808

The 9.0 revert already handled merge conflicts with bfb674e, which is in main
and 9.0.  However, this keeps the changes to age huge free regions.  Nothing is
done with those ages, but this avoids churn to put it back in later.
@github-actions github-actions bot locked and limited conversation to collaborators Oct 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-GC-coreclr Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants