Skip to content

Commit 57cdce3

Browse files
authored
Remove all #[inline] attributes (#196)
This commit adds a section on making builds with profile-guided optimization to the README and also adds an example script that makes a PGO build to the `.github/scripts` directory.
1 parent f721e28 commit 57cdce3

File tree

6 files changed

+86
-27
lines changed

6 files changed

+86
-27
lines changed

.github/scripts/pgo-build.sh

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/bin/bash
2+
3+
# Compile with profiling support
4+
RUSTFLAGS="-Cprofile-generate=/tmp/$USER/pgo-data" make CONF=linux-x86_64-normal-server-release THIRD_PARTY_HEAP=$PWD/../mmtk-openjdk/openjdk images
5+
6+
# Remove extraneous profiling data
7+
rm -rf /tmp/$USER/pgo-data/*
8+
9+
# Profile using fop
10+
MMTK_PLAN=GenImmix MMTK_STRESS_FACTOR=4194304 MMTK_PRECISE_STRESS=false ./build/linux-x86_64-normal-server-release/images/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Xms60M -Xmx60M -jar /usr/share/benchmarks/dacapo/dacapo-evaluation-git-6e411f33.jar -n 5 fop
11+
12+
# Merge profiling data
13+
/opt/rust/toolchains/1.66.1-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/llvm-profdata merge -o /tmp/$USER/pgo-data/merged.profdata /tmp/$USER/pgo-data
14+
15+
# Compile using profiling data
16+
RUSTFLAGS="-Cprofile-use=/tmp/$USER/pgo-data/merged.profdata -Cllvm-args=-pgo-warn-missing-function" make CONF=linux-x86_64-normal-server-release THIRD_PARTY_HEAP=$PWD/../mmtk-openjdk/openjdk images

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,67 @@ $ make CONF=linux-x86_64-normal-server-release THIRD_PARTY_HEAP=$PWD/../mmtk-ope
126126

127127
The output jdk is then found at `./build/linux-x86_64-normal-server-release/images/jdk`.
128128

129+
### Profile-Guided Optimized Build
130+
131+
In order to get the best performance, we recommend using a profile-guided
132+
optimized (PGO) build. Rust supports [PGO
133+
builds](https://doc.rust-lang.org/rustc/profile-guided-optimization.html) by
134+
directly hooking into the LLVM profiling infrastructure. In order to have the
135+
correct LLVM tools version, you should install the relevant `llvm-tools-preview`
136+
component using `rustup`:
137+
138+
```console
139+
$ rustup component add llvm-tools-preview
140+
```
141+
142+
In this example, we focus on the DaCapo benchmarks and the `GenImmix`
143+
collector. For best results, it is recommended to profile the workload you are
144+
interested in measuring. We use `fop` as it is a relatively small benchmark but
145+
also exercises the GC. In order to best tune our GC performance, we use a
146+
stress factor of 4 MB in order to trigger more GC events.
147+
148+
First we compile MMTk with profiling support:
149+
150+
```console
151+
$ RUSTFLAGS="-Cprofile-generate=/tmp/$USER/pgo-data" make CONF=linux-x86_64-normal-server-release THIRD_PARTY_HEAP=$PWD/../mmtk-openjdk/openjdk images
152+
$ rm -rf /tmp/$USER/pgo-data/*
153+
```
154+
We clear the `/tmp/$USER/pgo-data` directory as during compilation, the JVM we
155+
have created is used in a bootstrap process, resulting in profile data being
156+
emitted.
157+
158+
We then run `fop` in order to get some profiling data. Note that your location
159+
for the DaCapo benchmarks may be different:
160+
161+
```bash
162+
MMTK_PLAN=GenImmix MMTK_STRESS_FACTOR=4194304 MMTK_PRECISE_STRESS=false ./build/linux-x86_64-normal-server-release/images/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Xms60M -Xmx60M -jar /usr/share/benchmarks/dacapo/dacapo-evaluation-git-6e411f33.jar -n 5 fop
163+
```
164+
165+
We have to merge the profiling data into something we can feed into the Rust
166+
compiler using `llvm-profdata`:
167+
168+
```console
169+
$ /opt/rust/toolchains/1.66.1-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/llvm-profdata merge -o /tmp/$USER/pgo-data/merged.profdata /tmp/$USER/pgo-data
170+
```
171+
172+
The location of your version of `llvm-profdata` may be different to what we
173+
have above. *Make sure to only use a version of `llvm-profdata` that matches
174+
your Rust version.*
175+
176+
Finally, we build a new image using the profiling data as an input:
177+
178+
```console
179+
$ RUSTFLAGS="-Cprofile-use=/tmp/$USER/pgo-data/merged.profdata -Cllvm-args=-pgo-warn-missing-function" make CONF=linux-x86_64-normal-server-release THIRD_PARTY_HEAP=$PWD/../mmtk-openjdk/openjdk images
180+
```
181+
182+
We now have an OpenJDK build under
183+
`./build/linux-x86_64-normal-server-release/images/jdk` with MMTk that has been
184+
optimized using PGO.
185+
186+
For ease of use, we have provided an example script which does the above in
187+
`.github/scripts/pgo-build.sh` that you may adapt for your purposes. Note that
188+
you may have to change the location of `llvm-profdata`.
189+
129190
### Location of Mark-bit
130191
The location of the mark-bit can be specified by the environment variable
131192
`MARK_IN_HEADER`. By default, the mark-bit is located on the side (in a side

mmtk/src/abi.rs

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,17 +83,14 @@ impl Klass {
8383
&*(self as *const _ as usize as *const T)
8484
}
8585
/// Force slow-path for instance size calculation?
86-
#[inline(always)]
8786
const fn layout_helper_needs_slow_path(lh: i32) -> bool {
8887
(lh & Self::LH_INSTANCE_SLOW_PATH_BIT) != 0
8988
}
9089
/// Get log2 array element size
91-
#[inline(always)]
9290
const fn layout_helper_log2_element_size(lh: i32) -> i32 {
9391
(lh >> Self::LH_LOG2_ELEMENT_SIZE_SHIFT) & Self::LH_LOG2_ELEMENT_SIZE_MASK
9492
}
9593
/// Get array header size
96-
#[inline(always)]
9794
const fn layout_helper_header_size(lh: i32) -> i32 {
9895
(lh >> Self::LH_HEADER_SIZE_SHIFT) & Self::LH_HEADER_SIZE_MASK
9996
}
@@ -281,7 +278,6 @@ pub struct OopDesc {
281278
}
282279

283280
impl OopDesc {
284-
#[inline(always)]
285281
pub fn start(&self) -> Address {
286282
unsafe { mem::transmute(self) }
287283
}
@@ -300,15 +296,13 @@ pub type Oop = &'static OopDesc;
300296

301297
/// Convert ObjectReference to Oop
302298
impl From<ObjectReference> for &OopDesc {
303-
#[inline(always)]
304299
fn from(o: ObjectReference) -> Self {
305300
unsafe { mem::transmute(o) }
306301
}
307302
}
308303

309304
/// Convert Oop to ObjectReference
310305
impl From<&OopDesc> for ObjectReference {
311-
#[inline(always)]
312306
fn from(o: &OopDesc) -> Self {
313307
unsafe { mem::transmute(o) }
314308
}
@@ -324,13 +318,11 @@ impl OopDesc {
324318
}
325319

326320
/// Slow-path for calculating object instance size
327-
#[inline(always)]
328321
unsafe fn size_slow(&self) -> usize {
329322
((*UPCALLS).get_object_size)(self.into())
330323
}
331324

332325
/// Calculate object instance size
333-
#[inline(always)]
334326
pub unsafe fn size(&self) -> usize {
335327
let klass = self.klass;
336328
let lh = klass.layout_helper;

mmtk/src/object_model.rs

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ impl ObjectModel<OpenJDK> for VMObjectModel {
2121
const UNIFIED_OBJECT_REFERENCE_ADDRESS: bool = true;
2222
const OBJECT_REF_OFFSET_LOWER_BOUND: isize = 0;
2323

24-
#[inline]
2524
fn copy(
2625
from: ObjectReference,
2726
copy: CopySemantics,
@@ -81,22 +80,18 @@ impl ObjectModel<OpenJDK> for VMObjectModel {
8180
unimplemented!()
8281
}
8382

84-
#[inline(always)]
8583
fn ref_to_object_start(object: ObjectReference) -> Address {
8684
object.to_raw_address()
8785
}
8886

89-
#[inline(always)]
9087
fn ref_to_address(object: ObjectReference) -> Address {
9188
object.to_raw_address()
9289
}
9390

94-
#[inline(always)]
9591
fn ref_to_header(object: ObjectReference) -> Address {
9692
object.to_raw_address()
9793
}
9894

99-
#[inline(always)]
10095
fn address_to_ref(address: Address) -> ObjectReference {
10196
ObjectReference::from_raw_address(address)
10297
}

mmtk/src/object_scanning.rs

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ trait OopIterate: Sized {
1313
}
1414

1515
impl OopIterate for OopMapBlock {
16-
#[inline]
1716
fn oop_iterate(&self, oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
1817
let start = oop.get_field_address(self.offset);
1918
for i in 0..self.count as usize {
@@ -24,7 +23,6 @@ impl OopIterate for OopMapBlock {
2423
}
2524

2625
impl OopIterate for InstanceKlass {
27-
#[inline]
2826
fn oop_iterate(&self, oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
2927
let oop_maps = self.nonstatic_oop_maps();
3028
for map in oop_maps {
@@ -34,7 +32,6 @@ impl OopIterate for InstanceKlass {
3432
}
3533

3634
impl OopIterate for InstanceMirrorKlass {
37-
#[inline]
3835
fn oop_iterate(&self, oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
3936
self.instance_klass.oop_iterate(oop, closure);
4037
// if (Devirtualizer::do_metadata(closure)) {
@@ -75,7 +72,6 @@ impl OopIterate for InstanceMirrorKlass {
7572
}
7673

7774
impl OopIterate for InstanceClassLoaderKlass {
78-
#[inline]
7975
fn oop_iterate(&self, oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
8076
self.instance_klass.oop_iterate(oop, closure);
8177
// if (Devirtualizer::do_metadata(closure)) {
@@ -89,7 +85,6 @@ impl OopIterate for InstanceClassLoaderKlass {
8985
}
9086

9187
impl OopIterate for ObjArrayKlass {
92-
#[inline]
9388
fn oop_iterate(&self, oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
9489
let array = unsafe { oop.as_array_oop() };
9590
for oop in unsafe { array.data::<Oop>(BasicType::T_OBJECT) } {
@@ -99,15 +94,13 @@ impl OopIterate for ObjArrayKlass {
9994
}
10095

10196
impl OopIterate for TypeArrayKlass {
102-
#[inline]
10397
fn oop_iterate(&self, _oop: Oop, _closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
10498
// Performance tweak: We skip processing the klass pointer since all
10599
// TypeArrayKlasses are guaranteed processed via the null class loader.
106100
}
107101
}
108102

109103
impl OopIterate for InstanceRefKlass {
110-
#[inline]
111104
fn oop_iterate(&self, oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
112105
use crate::abi::*;
113106
use crate::api::{add_phantom_candidate, add_soft_candidate, add_weak_candidate};
@@ -135,11 +128,9 @@ impl OopIterate for InstanceRefKlass {
135128
}
136129

137130
impl InstanceRefKlass {
138-
#[inline]
139131
fn should_scan_weak_refs() -> bool {
140132
!*SINGLETON.get_options().no_reference_types
141133
}
142-
#[inline]
143134
fn process_ref_as_strong(oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
144135
let referent_addr = Self::referent_address(oop);
145136
closure.visit_edge(referent_addr);
@@ -155,7 +146,6 @@ fn oop_iterate_slow(oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>, tls:
155146
}
156147
}
157148

158-
#[inline]
159149
fn oop_iterate(oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
160150
let klass_id = oop.klass.id;
161151
debug_assert!(
@@ -192,7 +182,6 @@ fn oop_iterate(oop: Oop, closure: &mut impl EdgeVisitor<OpenJDKEdge>) {
192182
}
193183
}
194184

195-
#[inline]
196185
pub fn scan_object(
197186
object: ObjectReference,
198187
closure: &mut impl EdgeVisitor<OpenJDKEdge>,

openjdk/CompileThirdPartyHeap.gmk

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ MMTK_CPP_ROOT = $(THIRD_PARTY_HEAP)
1010
OPENJDK_VERSION=`cd $(MMTK_RUST_ROOT) ; cargo read-manifest --manifest-path=Cargo.toml | python3 -c 'import json,sys; print(json.load(sys.stdin)["metadata"]["openjdk"]["openjdk_version"])'`
1111
OPENJDK_LOCAL_VERSION=`git rev-parse HEAD`
1212

13+
# Get the current host triple. This is used for the PGO build as mentioned in
14+
# the best practices for PGO here:
15+
# https://doc.rust-lang.org/rustc/profile-guided-optimization.html#a-complete-cargo-workflow
16+
HOST_TRIPLE=`rustc -vV | grep host | cut -d' ' -f2`
17+
1318
ifdef MMTK_PLAN
1419
GC_FEATURES=--features $(MMTK_PLAN)
1520
endif
@@ -61,9 +66,10 @@ $(LIB_MMTK): FORCE
6166
echo -e $(YELLOW)Local OpenJDK version $(OPENJDK_LOCAL_VERSION)$(NC); \
6267
echo -e $(YELLOW)mmtk/Cargo.toml OpenJDK version $(OPENJDK_VERSION)$(NC); \
6368
fi
64-
echo "cd $(MMTK_RUST_ROOT) && cargo build $(CARGO_PROFILE_FLAG) $(GC_FEATURES)"
65-
cd $(MMTK_RUST_ROOT) && cargo build $(CARGO_PROFILE_FLAG) $(GC_FEATURES)
66-
cp $(MMTK_RUST_ROOT)/target/$(CARGO_PROFILE)/libmmtk_openjdk.so $(LIB_MMTK)
69+
cargo --version
70+
echo "cd $(MMTK_RUST_ROOT) && cargo build $(CARGO_PROFILE_FLAG) --target $(HOST_TRIPLE) $(GC_FEATURES)"
71+
cd $(MMTK_RUST_ROOT) && cargo build $(CARGO_PROFILE_FLAG) --target $(HOST_TRIPLE) $(GC_FEATURES)
72+
cp $(MMTK_RUST_ROOT)/target/$(HOST_TRIPLE)/$(CARGO_PROFILE)/libmmtk_openjdk.so $(LIB_MMTK)
6773

6874
JVM_LIBS += -L$(JVM_LIB_OUTPUTDIR) -lmmtk_openjdk
6975
JVM_LDFLAGS += '-Wl,-rpath,$$ORIGIN'

0 commit comments

Comments
 (0)