-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Adjust initial tlab size #25423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Adjust initial tlab size #25423
Conversation
…n/jdk into adjust-initial-tlab-size
👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
|
Leaving this in draft while I prepare details for review. |
We have found with certain workloads that the initial and maximum tlab sizes result in very high latencies for the first few invocations of particular methods for certain threads. The root cause is that TLABs are too large. This is causing allocatable memory to be depleted too quickly. When large numbers of threads are trying to startup at the same time, some of the threads end up with no TLABs or very small TLABs and their efforts run hundreds of times slower than the threads that were able to grab very large TLABs.
This PR reduces the maximum TLAB size and adjusts the initial TLAB size in order to reduce the impact of this problem.
This PR also changes the value of TLABAllocationWeight from 90 to 35 when we are running in generational mode. 35 is the default value used for G1 GC, which is also generational. The default value of 90 was established years ago for non-generational Shenandoah because it tends to have less frequent GC cycles than generational collectors.
With a ``small'' workload, the most significant benefit of this change is seen with p99.99 (66.1% latency improvement) and p99.999 (62.6% latency improvement). At other percentiles, the latency slightly increased (0.6% at p50, 1.7% at p100).
The small workload is represented by the following execution script:
With a ``medium'' workload, the impact is somewhat neutral, ranging from 9% improvement at p100 to 22.4% degradation at p99.999.
The medium workload is represented by this execution script:
The huge workload comparisons are still being tested...
The huge workload is represented by this execution script:
We also tested the impact of this change on one of our current development branches, identified as adaptive-evac-with-surge. Performance of this development branch, which we are in the process of merging into upstream, is what motivated the original efforts to explore improved tlab sizes.
For the same small workload described above running on a c6a.2xlarge host, the most significant benefits are seen at p99.99, p99.999, and p100 percentiles, with 50.1%, 17.6%, and 98.2% improvement respectively:
When this small workload is run on a m5.4xlarge host, we still see very significant benefits at p100, but degradation at p99.999.
The medium workload performed especially poorly without the improvements provided by this PR. All percentiles except p50 show very large improvement:
The huge workload is roughly neutral with this PR:
Progress
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25423/head:pull/25423
$ git checkout pull/25423
Update a local copy of the PR:
$ git checkout pull/25423
$ git pull https://git.openjdk.org/jdk.git pull/25423/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25423
View PR using the GUI difftool:
$ git pr show -t 25423
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25423.diff