Skip to content

Commit eda58a7

Browse files
authored
Merge pull request openjdk#40 from kdnilsen/master
Incorporate Extremem workload into repo
2 parents b3b8c60 + ee03a71 commit eda58a7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+12372
-0
lines changed

Extremem/.gitignore

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
.gradle
2+
.idea
3+
/build
4+
/buildSrc/build/
5+
/gradle/
6+
/gradlew
7+
/gradlew.bat
8+
/wrapper/
9+
/out/
10+
*.iml
11+
*.class
12+
*.jar
13+
*.h
14+
*.so
15+
*.out
16+
*.error
17+
*\.log
18+
*\.log\.*

Extremem/Config

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
package.Extremem = {
2+
interfaces = (1.0);
3+
4+
# Use NoOpBuild. See https://w.amazon.com/index.php/BrazilBuildSystem/NoOpBuild
5+
build-system = make;
6+
build-tools = {
7+
1.0 = {
8+
GnuMake = 3.81;
9+
NoOpBuild = 1.0;
10+
};
11+
};
12+
13+
# Use runtime-dependencies for when you want to bring in additional
14+
# packages when deploying.
15+
# Use dependencies instead if you intend for these dependencies to
16+
# be exported to other packages that build against you.
17+
dependencies = {
18+
1.0 = {
19+
};
20+
};
21+
22+
runtime-dependencies = {
23+
1.0 = {
24+
};
25+
};
26+
27+
};

Extremem/Example

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
This file includes a description of several sample command-line
2+
invocations along with guidance on how to interpret output file
3+
results.
4+
5+
The following script can be used to compare the performance of
6+
Parallel, G1, and Shenandoah garbage collectors on the same workload
7+
with different memory sizes. The GC log file from the parallel GC run
8+
is most useful in determining application allocation rates and live
9+
memory usage. It is more difficult to extract this information from
10+
G1 and Shenandoah because the states of individual objects are in flux
11+
at each moment that the log reports for the concurrent G1 and
12+
Shenandoah collectors are issued.
13+
14+
#!/bin/sh
15+
for i in 256M 384M 512M 640M 768M 896M 1024M \
16+
1152M 1280M 1408M 1536M 1664M 1792M 1920M 2048M
17+
do
18+
19+
echo Running Parallel GC with $i heap size
20+
>&2 echo Running Parallel GC with $i heap size
21+
22+
$JAVA_HOME/bin/java \
23+
-showversion \
24+
-Xlog:gc:batch.parallel-gc.$i.log \
25+
-XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \
26+
-XX:+DisableExplicitGC -Xms$i -Xmx$i \
27+
-XX:+UseParallelGC -XX:ParallelGCThreads=8 \
28+
-jar $EXTREMEM_HOME/extremem.jar \
29+
-jar src/main/java/extremem.jar \
30+
-dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true
31+
32+
echo Running G1 GC with $i heap size
33+
>&2 echo Running G1 GC with $i heap size
34+
35+
$JAVA_HOME/bin/java \
36+
-showversion \
37+
-Xlog:gc:batch.g1-gc.$i.log \
38+
-XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \
39+
-XX:+DisableExplicitGC -Xms$i -Xmx$i \
40+
-jar $EXTREMEM_HOME/extremem.jar \
41+
-dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true
42+
43+
echo Running Shenandoah GC with $i heap size
44+
>&2 echo Running Shenandoah GC with $i heap size
45+
46+
$JAVA_HOME/bin/java \
47+
-showversion \
48+
-Xlog:gc:batch.shenandoah-gc.$i.log \
49+
-XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \
50+
-XX:+DisableExplicitGC -Xms$i -Xmx$i \
51+
-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC \
52+
-XX:+ShenandoahPacing -XX:ShenandoahPacingMaxDelay=1 \
53+
-jar $EXTREMEM_HOME/extremem.jar \
54+
-dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true
55+
56+
done
57+
58+
Here's a different script that represents a workload configuration
59+
that keeps more memory alive and thus requires larger heap size
60+
configurations of the JVM. In this code, the first loop runs with
61+
Parallel GC for the purpose of calibrating the workload's allocation
62+
rates and live-memory usage. The second loop provides comparisons
63+
between G1 and Shenandoah GC on this workload.
64+
65+
#!/bin/sh
66+
for i in 30G 48G
67+
do
68+
69+
echo Running Parallel GC with $i heap size on huge workload
70+
>&2 echo Running Parallel GC with $i heap size on huge workload
71+
72+
$JAVA_HOME/bin/java \
73+
-showversion \
74+
-Xlog:gc:batch.calibrate-shen.parallel-gc.$i.log \
75+
-XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \
76+
-XX:+DisableExplicitGC -Xms$i -Xmx$i \
77+
-XX:+UseParallelGC -XX:ParallelGCThreads=16 \
78+
-jar $EXTREMEM_HOME/extremem.jar \
79+
-dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \
80+
-dCustomerReplacementCount=40 -dCustomerThreads=20000 \
81+
-dServerThreads=100 -dProductReplacementCount=8 \
82+
-dProductReplacementPeriod=36s \
83+
-dProductNameLength=25 -dProductDescriptionLength=120 \
84+
-dProductReviewLength=32 -dSelectionCriteriaCount=8 \
85+
-dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \
86+
-dBrowsingExpiration=1m -dSimulationDuration=40m \
87+
-dInitializationDelay=100ms -dReportCSV=true
88+
89+
done
90+
91+
for i in 30G 32G 48G 64G 96G 112G
92+
do
93+
94+
echo Running G1 GC with $i heap size on huge workload
95+
>&2 echo Running G1 GC with $i heap size on huge workload
96+
97+
$JAVA_HOME/bin/java \
98+
-showversion \
99+
-Xlog:gc:batch.show-shen.g1-gc.$i.log \
100+
-XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \
101+
-XX:+DisableExplicitGC -Xms$i -Xmx$i \
102+
-jar $EXTREMEM_HOME/extremem.jar \
103+
-dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \
104+
-dCustomerReplacementCount=40 -dCustomerThreads=20000 \
105+
-dServerThreads=100 -dProductReplacementCount=8 \
106+
-dProductReplacementPeriod=36s \
107+
-dProductNameLength=25 -dProductDescriptionLength=120 \
108+
-dProductReviewLength=32 -dSelectionCriteriaCount=8 \
109+
-dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \
110+
-dBrowsingExpiration=1m -dSimulationDuration=20m \
111+
-dInitializationDelay=100ms -dReportCSV=true
112+
113+
echo Running Shenandoah GC with $i heap size on huge workload
114+
>&2 echo Running Shenandoah GC with $i heap size on huge workload
115+
116+
$JAVA_HOME/bin/java \
117+
-showversion \
118+
-Xlog:gc:batch.show-shen.shenandoah-gc.$i.log \
119+
-XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \
120+
-XX:+DisableExplicitGC -Xms$i -Xmx$i \
121+
-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC \
122+
-XX:+ShenandoahPacing -XX:ShenandoahPacingMaxDelay=1 \
123+
-jar $EXTREMEM_HOME/extremem.jar \
124+
-dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \
125+
-dCustomerReplacementCount=40 -dCustomerThreads=20000 \
126+
-dServerThreads=100 -dProductReplacementCount=8 \
127+
-dProductReplacementPeriod=36s \
128+
-dProductNameLength=25 -dProductDescriptionLength=120 \
129+
-dProductReviewLength=32 -dSelectionCriteriaCount=8 \
130+
-dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \
131+
-dBrowsingExpiration=1m -dSimulationDuration=20m \
132+
-dInitializationDelay=100ms -dReportCSV=true
133+
done
134+
135+
136+
The CSV report that is written to standard output independently
137+
reports the timeliness of many different repetitive operations that
138+
comprise the Extremem workload. Certain of the operations are more
139+
susceptible to randomness in the execution path lengths than others.
140+
For example, in one configuration, the number of products available
141+
for selection (based on lookups of randomly selected keywords within
142+
the data base of all existing products) ranged from 376 to 945. This
143+
is represented by the following excerpt of standard output:
144+
145+
+-------------------------
146+
|Products available for selection:
147+
|max,945
148+
|min,376
149+
|average,609.597
150+
+-------------------------
151+
152+
Given this, we would expect to see a three-fold variance in
153+
execution times for product selection based on the size of the
154+
selection set alone, independent of GC interference. Following are
155+
the lines in the output report that describe the times required to
156+
process particular stepss of each "customer interaction". Preparation
157+
represents the time required to gather up and present to the customer
158+
all of the products that matched the customer inquiry. The following
159+
excerpt from standard output reports on the times required to perform
160+
this step. Note that, out of a total of 600,000 customer
161+
interactions, the time required to present product options ranged from
162+
2 microseconds to 2.072.002 seconds. This is far worse than the
163+
3-fold variation that would have been predicted by differences in
164+
problem size. The additional variance is due primarily to interference
165+
caused by various garbage collection activities.
166+
167+
+-------------------------
168+
|Timeliness of preparation
169+
|Total Measurement,Min,Max,Mean,Approximate Median
170+
|600000,2,2072002,19753,9216
171+
|Buckets in use,12
172+
|Bucket Start,Bucket End, Bucket Tally
173+
|-256,0,0
174+
|0,1024,6981
175+
|1024,5120,3158
176+
|5120,13312,443628
177+
|13312,29696,113963
178+
|29696,46080,13483
179+
|46080,78848,9486
180+
|78848,144384,3469
181+
|144384,275456,1864
182+
|275456,537600,1345
183+
|537600,1061888,1541
184+
|1061888,2110464,1082
185+
+-------------------------
186+
187+
Following the line of output that reports summary information, the
188+
distribution of measured service response times is reported as a
189+
weighted histogram. The intepretation of the above "bucket-list" data
190+
is that:
191+
192+
6,981 samples required less than 1.024 ms
193+
3,158 samples required less than 5,120 ms and at least 1.024 ms
194+
443,628 samples required less than 13.312 ms and at least 5.120 ms
195+
113,963 samples required less than 29.696 ms and at least 13.312 ms
196+
13,483 samples required less than 46.080 ms and at least 29.696 ms
197+
9,486 samples required less than 78.848 ms and at least 46.080 ms
198+
3,469 samples required less than 144.384 ms and at least 78.848 ms
199+
1,864 samples required less than 275.456 ms and at least 144.384 ms
200+
1,345 samples required less than 537.600 ms and at least 275.456 ms
201+
1,541 samples required less than 1.061888 s and at least 537.600 ms
202+
1,082 samples required less than 2.110464 s and at least 1.061888 s
203+
204+
205+
Each of the service response times reported by Extremem represents
206+
slightly different mixes of new-object allocation, read access to
207+
previously allocated objects, and mutation of previoiusly allocated
208+
objects.
209+
210+
Product replacement processing is a service that has a more
211+
predictable workload. Each time product replacement processing is
212+
performed, the same number of randomly selected products are
213+
replaced, as specified on the extremem command line. Note that even
214+
with product replacement processing, there is some expected variation
215+
in processing effort, since the underlying representation of the
216+
product data base is a balanced binary tree protected by a
217+
multiple-reader single-writer mutual exclusion lock. If a large
218+
number of customers happen to be looking up products at the moment a
219+
request to remove products arrives, this will force a delay on the
220+
modifying thread while it waits for all the customer threads to
221+
complete their lookup requests. The standard output file also
222+
provides information regarding contended access to shared data
223+
structures. For example, the following lines report that the typical
224+
writer of the products data base (i.e. the typical attempt to replace
225+
products) has to perform an average of 1.976 wait operations with the
226+
total number of wait operations for any single writer ranging from 0
227+
to 573.
228+
229+
+-------------------------
230+
|Products concurrency report:
231+
|Total reader requests,1199860
232+
|requiring this many waits,2010665
233+
|average waits,1.6757497
234+
|ranging from,0
235+
|to,783
236+
|
237+
|Total writer requests,160000
238+
|requiring this many waits,316137
239+
|average waits,1.9758563
240+
|ranging from,0
241+
|to,573
242+
+-------------------------
243+
244+
It is sometimes interesting to observe that the same workload
245+
configuration exhibits different concurrency contention under
246+
different garbage collection approaches. This is a secondary affect
247+
that results when a thread that holds a lock becomes blocked waiting
248+
for garbage collection services to be performed. This is a form of
249+
priority inversion in that garbage collection, typically running at a
250+
lower priority that application threads, is able to cause high
251+
priority application threads to delay their efforts.
252+
253+
For this same workload configuration, Product replacement processing
254+
timeliness is represented by the following entries in the report
255+
written to standard output:
256+
257+
+-------------------------
258+
|Product replacement processing:
259+
|batches,20000
260+
|total,160000
261+
|min per batch,8
262+
|max per batch,8
263+
|average per batch,8.0
264+
|
265+
|Total Measurement,Min,Max,Mean,Approximate Median
266+
|20000,5,1436893,43450,10496
267+
|Buckets in use,12
268+
|Bucket Start,Bucket End, Bucket Tally
269+
|-256,256,114
270+
|256,2304,18
271+
|2304,6400,1
272+
|6400,14592,13535
273+
|14592,30976,2225
274+
|30976,47360,1037
275+
|47360,80128,1034
276+
|80128,145664,824
277+
|145664,276736,608
278+
|276736,538880,312
279+
|538880,1063168,256
280+
|1063168,2111744,36
281+
+-------------------------
282+
283+
In 20,000 measurements, the time required to replace 8 products ranged
284+
from 5 microseconds to 1.436893 seconds. Compare the performance with
285+
different GC techniques for a better appreciation of the full direct
286+
and indirect impacts that GC implementation approaches have on service
287+
response times.
288+
289+

0 commit comments

Comments
 (0)