|
| 1 | +This file includes a description of several sample command-line |
| 2 | +invocations along with guidance on how to interpret output file |
| 3 | +results. |
| 4 | + |
| 5 | +The following script can be used to compare the performance of |
| 6 | +Parallel, G1, and Shenandoah garbage collectors on the same workload |
| 7 | +with different memory sizes. The GC log file from the parallel GC run |
| 8 | +is most useful in determining application allocation rates and live |
| 9 | +memory usage. It is more difficult to extract this information from |
| 10 | +G1 and Shenandoah because the states of individual objects are in flux |
| 11 | +at each moment that the log reports for the concurrent G1 and |
| 12 | +Shenandoah collectors are issued. |
| 13 | + |
| 14 | +#!/bin/sh |
| 15 | +for i in 256M 384M 512M 640M 768M 896M 1024M \ |
| 16 | + 1152M 1280M 1408M 1536M 1664M 1792M 1920M 2048M |
| 17 | +do |
| 18 | + |
| 19 | +echo Running Parallel GC with $i heap size |
| 20 | +>&2 echo Running Parallel GC with $i heap size |
| 21 | + |
| 22 | +$JAVA_HOME/bin/java \ |
| 23 | + -showversion \ |
| 24 | + -Xlog:gc:batch.parallel-gc.$i.log \ |
| 25 | + -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ |
| 26 | + -XX:+DisableExplicitGC -Xms$i -Xmx$i \ |
| 27 | + -XX:+UseParallelGC -XX:ParallelGCThreads=8 \ |
| 28 | + -jar $EXTREMEM_HOME/extremem.jar \ |
| 29 | + -jar src/main/java/extremem.jar \ |
| 30 | + -dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true |
| 31 | + |
| 32 | +echo Running G1 GC with $i heap size |
| 33 | +>&2 echo Running G1 GC with $i heap size |
| 34 | + |
| 35 | +$JAVA_HOME/bin/java \ |
| 36 | + -showversion \ |
| 37 | + -Xlog:gc:batch.g1-gc.$i.log \ |
| 38 | + -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ |
| 39 | + -XX:+DisableExplicitGC -Xms$i -Xmx$i \ |
| 40 | + -jar $EXTREMEM_HOME/extremem.jar \ |
| 41 | + -dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true |
| 42 | + |
| 43 | +echo Running Shenandoah GC with $i heap size |
| 44 | +>&2 echo Running Shenandoah GC with $i heap size |
| 45 | + |
| 46 | +$JAVA_HOME/bin/java \ |
| 47 | + -showversion \ |
| 48 | + -Xlog:gc:batch.shenandoah-gc.$i.log \ |
| 49 | + -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ |
| 50 | + -XX:+DisableExplicitGC -Xms$i -Xmx$i \ |
| 51 | + -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC \ |
| 52 | + -XX:+ShenandoahPacing -XX:ShenandoahPacingMaxDelay=1 \ |
| 53 | + -jar $EXTREMEM_HOME/extremem.jar \ |
| 54 | + -dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true |
| 55 | + |
| 56 | +done |
| 57 | + |
| 58 | +Here's a different script that represents a workload configuration |
| 59 | +that keeps more memory alive and thus requires larger heap size |
| 60 | +configurations of the JVM. In this code, the first loop runs with |
| 61 | +Parallel GC for the purpose of calibrating the workload's allocation |
| 62 | +rates and live-memory usage. The second loop provides comparisons |
| 63 | +between G1 and Shenandoah GC on this workload. |
| 64 | + |
| 65 | +#!/bin/sh |
| 66 | +for i in 30G 48G |
| 67 | +do |
| 68 | + |
| 69 | +echo Running Parallel GC with $i heap size on huge workload |
| 70 | +>&2 echo Running Parallel GC with $i heap size on huge workload |
| 71 | + |
| 72 | +$JAVA_HOME/bin/java \ |
| 73 | + -showversion \ |
| 74 | + -Xlog:gc:batch.calibrate-shen.parallel-gc.$i.log \ |
| 75 | + -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ |
| 76 | + -XX:+DisableExplicitGC -Xms$i -Xmx$i \ |
| 77 | + -XX:+UseParallelGC -XX:ParallelGCThreads=16 \ |
| 78 | + -jar $EXTREMEM_HOME/extremem.jar \ |
| 79 | + -dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \ |
| 80 | + -dCustomerReplacementCount=40 -dCustomerThreads=20000 \ |
| 81 | + -dServerThreads=100 -dProductReplacementCount=8 \ |
| 82 | + -dProductReplacementPeriod=36s \ |
| 83 | + -dProductNameLength=25 -dProductDescriptionLength=120 \ |
| 84 | + -dProductReviewLength=32 -dSelectionCriteriaCount=8 \ |
| 85 | + -dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \ |
| 86 | + -dBrowsingExpiration=1m -dSimulationDuration=40m \ |
| 87 | + -dInitializationDelay=100ms -dReportCSV=true |
| 88 | + |
| 89 | +done |
| 90 | + |
| 91 | +for i in 30G 32G 48G 64G 96G 112G |
| 92 | +do |
| 93 | + |
| 94 | +echo Running G1 GC with $i heap size on huge workload |
| 95 | +>&2 echo Running G1 GC with $i heap size on huge workload |
| 96 | + |
| 97 | +$JAVA_HOME/bin/java \ |
| 98 | + -showversion \ |
| 99 | + -Xlog:gc:batch.show-shen.g1-gc.$i.log \ |
| 100 | + -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ |
| 101 | + -XX:+DisableExplicitGC -Xms$i -Xmx$i \ |
| 102 | + -jar $EXTREMEM_HOME/extremem.jar \ |
| 103 | + -dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \ |
| 104 | + -dCustomerReplacementCount=40 -dCustomerThreads=20000 \ |
| 105 | + -dServerThreads=100 -dProductReplacementCount=8 \ |
| 106 | + -dProductReplacementPeriod=36s \ |
| 107 | + -dProductNameLength=25 -dProductDescriptionLength=120 \ |
| 108 | + -dProductReviewLength=32 -dSelectionCriteriaCount=8 \ |
| 109 | + -dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \ |
| 110 | + -dBrowsingExpiration=1m -dSimulationDuration=20m \ |
| 111 | + -dInitializationDelay=100ms -dReportCSV=true |
| 112 | + |
| 113 | +echo Running Shenandoah GC with $i heap size on huge workload |
| 114 | +>&2 echo Running Shenandoah GC with $i heap size on huge workload |
| 115 | + |
| 116 | +$JAVA_HOME/bin/java \ |
| 117 | + -showversion \ |
| 118 | + -Xlog:gc:batch.show-shen.shenandoah-gc.$i.log \ |
| 119 | + -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ |
| 120 | + -XX:+DisableExplicitGC -Xms$i -Xmx$i \ |
| 121 | + -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC \ |
| 122 | + -XX:+ShenandoahPacing -XX:ShenandoahPacingMaxDelay=1 \ |
| 123 | + -jar $EXTREMEM_HOME/extremem.jar \ |
| 124 | + -dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \ |
| 125 | + -dCustomerReplacementCount=40 -dCustomerThreads=20000 \ |
| 126 | + -dServerThreads=100 -dProductReplacementCount=8 \ |
| 127 | + -dProductReplacementPeriod=36s \ |
| 128 | + -dProductNameLength=25 -dProductDescriptionLength=120 \ |
| 129 | + -dProductReviewLength=32 -dSelectionCriteriaCount=8 \ |
| 130 | + -dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \ |
| 131 | + -dBrowsingExpiration=1m -dSimulationDuration=20m \ |
| 132 | + -dInitializationDelay=100ms -dReportCSV=true |
| 133 | +done |
| 134 | + |
| 135 | + |
| 136 | +The CSV report that is written to standard output independently |
| 137 | +reports the timeliness of many different repetitive operations that |
| 138 | +comprise the Extremem workload. Certain of the operations are more |
| 139 | +susceptible to randomness in the execution path lengths than others. |
| 140 | +For example, in one configuration, the number of products available |
| 141 | +for selection (based on lookups of randomly selected keywords within |
| 142 | +the data base of all existing products) ranged from 376 to 945. This |
| 143 | +is represented by the following excerpt of standard output: |
| 144 | + |
| 145 | ++------------------------- |
| 146 | +|Products available for selection: |
| 147 | +|max,945 |
| 148 | +|min,376 |
| 149 | +|average,609.597 |
| 150 | ++------------------------- |
| 151 | + |
| 152 | +Given this, we would expect to see a three-fold variance in |
| 153 | +execution times for product selection based on the size of the |
| 154 | +selection set alone, independent of GC interference. Following are |
| 155 | +the lines in the output report that describe the times required to |
| 156 | +process particular stepss of each "customer interaction". Preparation |
| 157 | +represents the time required to gather up and present to the customer |
| 158 | +all of the products that matched the customer inquiry. The following |
| 159 | +excerpt from standard output reports on the times required to perform |
| 160 | +this step. Note that, out of a total of 600,000 customer |
| 161 | +interactions, the time required to present product options ranged from |
| 162 | +2 microseconds to 2.072.002 seconds. This is far worse than the |
| 163 | +3-fold variation that would have been predicted by differences in |
| 164 | +problem size. The additional variance is due primarily to interference |
| 165 | +caused by various garbage collection activities. |
| 166 | + |
| 167 | ++------------------------- |
| 168 | +|Timeliness of preparation |
| 169 | +|Total Measurement,Min,Max,Mean,Approximate Median |
| 170 | +|600000,2,2072002,19753,9216 |
| 171 | +|Buckets in use,12 |
| 172 | +|Bucket Start,Bucket End, Bucket Tally |
| 173 | +|-256,0,0 |
| 174 | +|0,1024,6981 |
| 175 | +|1024,5120,3158 |
| 176 | +|5120,13312,443628 |
| 177 | +|13312,29696,113963 |
| 178 | +|29696,46080,13483 |
| 179 | +|46080,78848,9486 |
| 180 | +|78848,144384,3469 |
| 181 | +|144384,275456,1864 |
| 182 | +|275456,537600,1345 |
| 183 | +|537600,1061888,1541 |
| 184 | +|1061888,2110464,1082 |
| 185 | ++------------------------- |
| 186 | + |
| 187 | +Following the line of output that reports summary information, the |
| 188 | +distribution of measured service response times is reported as a |
| 189 | +weighted histogram. The intepretation of the above "bucket-list" data |
| 190 | +is that: |
| 191 | + |
| 192 | + 6,981 samples required less than 1.024 ms |
| 193 | + 3,158 samples required less than 5,120 ms and at least 1.024 ms |
| 194 | + 443,628 samples required less than 13.312 ms and at least 5.120 ms |
| 195 | + 113,963 samples required less than 29.696 ms and at least 13.312 ms |
| 196 | + 13,483 samples required less than 46.080 ms and at least 29.696 ms |
| 197 | + 9,486 samples required less than 78.848 ms and at least 46.080 ms |
| 198 | + 3,469 samples required less than 144.384 ms and at least 78.848 ms |
| 199 | + 1,864 samples required less than 275.456 ms and at least 144.384 ms |
| 200 | + 1,345 samples required less than 537.600 ms and at least 275.456 ms |
| 201 | + 1,541 samples required less than 1.061888 s and at least 537.600 ms |
| 202 | + 1,082 samples required less than 2.110464 s and at least 1.061888 s |
| 203 | + |
| 204 | + |
| 205 | +Each of the service response times reported by Extremem represents |
| 206 | +slightly different mixes of new-object allocation, read access to |
| 207 | +previously allocated objects, and mutation of previoiusly allocated |
| 208 | +objects. |
| 209 | + |
| 210 | +Product replacement processing is a service that has a more |
| 211 | +predictable workload. Each time product replacement processing is |
| 212 | +performed, the same number of randomly selected products are |
| 213 | +replaced, as specified on the extremem command line. Note that even |
| 214 | +with product replacement processing, there is some expected variation |
| 215 | +in processing effort, since the underlying representation of the |
| 216 | +product data base is a balanced binary tree protected by a |
| 217 | +multiple-reader single-writer mutual exclusion lock. If a large |
| 218 | +number of customers happen to be looking up products at the moment a |
| 219 | +request to remove products arrives, this will force a delay on the |
| 220 | +modifying thread while it waits for all the customer threads to |
| 221 | +complete their lookup requests. The standard output file also |
| 222 | +provides information regarding contended access to shared data |
| 223 | +structures. For example, the following lines report that the typical |
| 224 | +writer of the products data base (i.e. the typical attempt to replace |
| 225 | +products) has to perform an average of 1.976 wait operations with the |
| 226 | +total number of wait operations for any single writer ranging from 0 |
| 227 | +to 573. |
| 228 | + |
| 229 | ++------------------------- |
| 230 | +|Products concurrency report: |
| 231 | +|Total reader requests,1199860 |
| 232 | +|requiring this many waits,2010665 |
| 233 | +|average waits,1.6757497 |
| 234 | +|ranging from,0 |
| 235 | +|to,783 |
| 236 | +| |
| 237 | +|Total writer requests,160000 |
| 238 | +|requiring this many waits,316137 |
| 239 | +|average waits,1.9758563 |
| 240 | +|ranging from,0 |
| 241 | +|to,573 |
| 242 | ++------------------------- |
| 243 | + |
| 244 | +It is sometimes interesting to observe that the same workload |
| 245 | +configuration exhibits different concurrency contention under |
| 246 | +different garbage collection approaches. This is a secondary affect |
| 247 | +that results when a thread that holds a lock becomes blocked waiting |
| 248 | +for garbage collection services to be performed. This is a form of |
| 249 | +priority inversion in that garbage collection, typically running at a |
| 250 | +lower priority that application threads, is able to cause high |
| 251 | +priority application threads to delay their efforts. |
| 252 | + |
| 253 | +For this same workload configuration, Product replacement processing |
| 254 | +timeliness is represented by the following entries in the report |
| 255 | +written to standard output: |
| 256 | + |
| 257 | ++------------------------- |
| 258 | +|Product replacement processing: |
| 259 | +|batches,20000 |
| 260 | +|total,160000 |
| 261 | +|min per batch,8 |
| 262 | +|max per batch,8 |
| 263 | +|average per batch,8.0 |
| 264 | +| |
| 265 | +|Total Measurement,Min,Max,Mean,Approximate Median |
| 266 | +|20000,5,1436893,43450,10496 |
| 267 | +|Buckets in use,12 |
| 268 | +|Bucket Start,Bucket End, Bucket Tally |
| 269 | +|-256,256,114 |
| 270 | +|256,2304,18 |
| 271 | +|2304,6400,1 |
| 272 | +|6400,14592,13535 |
| 273 | +|14592,30976,2225 |
| 274 | +|30976,47360,1037 |
| 275 | +|47360,80128,1034 |
| 276 | +|80128,145664,824 |
| 277 | +|145664,276736,608 |
| 278 | +|276736,538880,312 |
| 279 | +|538880,1063168,256 |
| 280 | +|1063168,2111744,36 |
| 281 | ++------------------------- |
| 282 | + |
| 283 | +In 20,000 measurements, the time required to replace 8 products ranged |
| 284 | +from 5 microseconds to 1.436893 seconds. Compare the performance with |
| 285 | +different GC techniques for a better appreciation of the full direct |
| 286 | +and indirect impacts that GC implementation approaches have on service |
| 287 | +response times. |
| 288 | + |
| 289 | + |
0 commit comments