Skip to content

Commit e68aa99

Browse files
authored
Implement the memory64 proposal in Wasmtime (#3153)
* Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can't grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that's what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we're going to/from wasm. Overall I've tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren't aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn't want to enable this by default naturally until we've fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it's implemented with "dynamic" heaps which have a few consequences: * All memory accesses are bounds-checked. I'm not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven't stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it's really not all that comprehensive. I've tried adding more tests for basic things as I've had to implement guards for them, but I wouldn't really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages
1 parent 76a93dc commit e68aa99

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1361
-640
lines changed

build.rs

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ fn main() -> anyhow::Result<()> {
3434
test_directory_module(out, "tests/misc_testsuite/module-linking", strategy)?;
3535
test_directory_module(out, "tests/misc_testsuite/simd", strategy)?;
3636
test_directory_module(out, "tests/misc_testsuite/threads", strategy)?;
37+
test_directory_module(out, "tests/misc_testsuite/memory64", strategy)?;
3738
Ok(())
3839
})?;
3940

@@ -53,6 +54,7 @@ fn main() -> anyhow::Result<()> {
5354
"tests/spec_testsuite/proposals/bulk-memory-operations",
5455
strategy,
5556
)?;
57+
test_directory_module(out, "tests/spec_testsuite/proposals/memory64", strategy)?;
5658
} else {
5759
println!(
5860
"cargo:warning=The spec testsuite is disabled. To enable, run `git submodule \
@@ -157,7 +159,7 @@ fn write_testsuite_tests(
157159

158160
writeln!(out, "#[test]")?;
159161
// Ignore when using QEMU for running tests (limited memory).
160-
if ignore(testsuite, &testname, strategy) || (pooling && platform_is_emulated()) {
162+
if ignore(testsuite, &testname, strategy) {
161163
writeln!(out, "#[ignore]")?;
162164
}
163165

@@ -213,7 +215,3 @@ fn ignore(testsuite: &str, testname: &str, strategy: &str) -> bool {
213215
fn platform_is_s390x() -> bool {
214216
env::var("CARGO_CFG_TARGET_ARCH").unwrap() == "s390x"
215217
}
216-
217-
fn platform_is_emulated() -> bool {
218-
env::var("WASMTIME_TEST_NO_HOG_MEMORY").unwrap_or_default() == "1"
219-
}

cranelift/codegen/src/ir/immediates.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,12 @@ impl From<Uimm32> for u32 {
291291
}
292292
}
293293

294+
impl From<Uimm32> for u64 {
295+
fn from(val: Uimm32) -> u64 {
296+
val.0.into()
297+
}
298+
}
299+
294300
impl From<Uimm32> for i64 {
295301
fn from(val: Uimm32) -> i64 {
296302
i64::from(val.0)

cranelift/codegen/src/legalizer/heap.rs

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ pub fn expand_heap_addr(
2525
imm,
2626
} => {
2727
debug_assert_eq!(opcode, ir::Opcode::HeapAddr);
28-
(heap, arg, imm.into())
28+
(heap, arg, u64::from(imm))
2929
}
3030
_ => panic!("Wanted heap_addr: {}", func.dfg.display_inst(inst, None)),
3131
};
@@ -53,11 +53,10 @@ fn dynamic_addr(
5353
inst: ir::Inst,
5454
heap: ir::Heap,
5555
offset: ir::Value,
56-
access_size: u32,
56+
access_size: u64,
5757
bound_gv: ir::GlobalValue,
5858
func: &mut ir::Function,
5959
) {
60-
let access_size = u64::from(access_size);
6160
let offset_ty = func.dfg.value_type(offset);
6261
let addr_ty = func.dfg.value_type(func.dfg.first_result(inst));
6362
let min_size = func.heaps[heap].min_size.into();
@@ -113,12 +112,11 @@ fn static_addr(
113112
inst: ir::Inst,
114113
heap: ir::Heap,
115114
mut offset: ir::Value,
116-
access_size: u32,
115+
access_size: u64,
117116
bound: u64,
118117
func: &mut ir::Function,
119118
cfg: &mut ControlFlowGraph,
120119
) {
121-
let access_size = u64::from(access_size);
122120
let offset_ty = func.dfg.value_type(offset);
123121
let addr_ty = func.dfg.value_type(func.dfg.first_result(inst));
124122
let mut pos = FuncCursor::new(func).at_inst(inst);

cranelift/wasm/src/code_translator.rs

Lines changed: 104 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2164,10 +2164,6 @@ fn prepare_addr<FE: FuncEnvironment + ?Sized>(
21642164
environ: &mut FE,
21652165
) -> WasmResult<(MemFlags, Value, Offset32)> {
21662166
let addr = state.pop1();
2167-
// This function will need updates for 64-bit memories
2168-
debug_assert_eq!(builder.func.dfg.value_type(addr), I32);
2169-
let offset = u32::try_from(memarg.offset).unwrap();
2170-
21712167
let heap = state.get_heap(builder.func, memarg.memory, environ)?;
21722168
let offset_guard_size: u64 = builder.func.heaps[heap].offset_guard_size.into();
21732169

@@ -2176,13 +2172,19 @@ fn prepare_addr<FE: FuncEnvironment + ?Sized>(
21762172
// segfaults) to generate traps since that means we don't have to bounds
21772173
// check anything explicitly.
21782174
//
2179-
// If we don't have a guard page of unmapped memory, though, then we can't
2180-
// rely on this trapping behavior through segfaults. Instead we need to
2181-
// bounds-check the entire memory access here which is everything from
2175+
// (1) If we don't have a guard page of unmapped memory, though, then we
2176+
// can't rely on this trapping behavior through segfaults. Instead we need
2177+
// to bounds-check the entire memory access here which is everything from
21822178
// `addr32 + offset` to `addr32 + offset + width` (not inclusive). In this
2183-
// scenario our adjusted offset that we're checking is `offset + width`.
2179+
// scenario our adjusted offset that we're checking is `memarg.offset +
2180+
// access_size`. Note that we do saturating arithmetic here to avoid
2181+
// overflow. THe addition here is in the 64-bit space, which means that
2182+
// we'll never overflow for 32-bit wasm but for 64-bit this is an issue. If
2183+
// our effective offset is u64::MAX though then it's impossible for for
2184+
// that to actually be a valid offset because otherwise the wasm linear
2185+
// memory would take all of the host memory!
21842186
//
2185-
// If we have a guard page, however, then we can perform a further
2187+
// (2) If we have a guard page, however, then we can perform a further
21862188
// optimization of the generated code by only checking multiples of the
21872189
// offset-guard size to be more CSE-friendly. Knowing that we have at least
21882190
// 1 page of a guard page we're then able to disregard the `width` since we
@@ -2215,32 +2217,104 @@ fn prepare_addr<FE: FuncEnvironment + ?Sized>(
22152217
// in-bounds or will hit the guard page, meaning we'll get the desired
22162218
// semantics we want.
22172219
//
2218-
// As one final comment on the bits with the guard size here, another goal
2219-
// of this is to hit an optimization in `heap_addr` where if the heap size
2220-
// minus the offset is >= 4GB then bounds checks are 100% eliminated. This
2221-
// means that with huge guard regions (e.g. our 2GB default) most adjusted
2222-
// offsets we're checking here are zero. This means that we'll hit the fast
2223-
// path and emit zero conditional traps for bounds checks
2220+
// ---
2221+
//
2222+
// With all that in mind remember that the goal is to bounds check as few
2223+
// things as possible. To facilitate this the "fast path" is expected to be
2224+
// hit like so:
2225+
//
2226+
// * For wasm32, wasmtime defaults to 4gb "static" memories with 2gb guard
2227+
// regions. This means our `adjusted_offset` is 1 for all offsets <=2gb.
2228+
// This hits the optimized case for `heap_addr` on static memories 4gb in
2229+
// size in cranelift's legalization of `heap_addr`, eliding the bounds
2230+
// check entirely.
2231+
//
2232+
// * For wasm64 offsets <=2gb will generate a single `heap_addr`
2233+
// instruction, but at this time all heaps are "dyanmic" which means that
2234+
// a single bounds check is forced. Ideally we'd do better here, but
2235+
// that's the current state of affairs.
2236+
//
2237+
// Basically we assume that most configurations have a guard page and most
2238+
// offsets in `memarg` are <=2gb, which means we get the fast path of one
2239+
// `heap_addr` instruction plus a hardcoded i32-offset in memory-related
2240+
// instructions.
22242241
let adjusted_offset = if offset_guard_size == 0 {
2225-
u64::from(offset) + u64::from(access_size)
2242+
// Why saturating? see (1) above
2243+
memarg.offset.saturating_add(u64::from(access_size))
22262244
} else {
2245+
// Why is there rounding here? see (2) above
22272246
assert!(access_size < 1024);
2228-
cmp::max(u64::from(offset) / offset_guard_size * offset_guard_size, 1)
2247+
cmp::max(memarg.offset / offset_guard_size * offset_guard_size, 1)
22292248
};
2249+
22302250
debug_assert!(adjusted_offset > 0); // want to bounds check at least 1 byte
2231-
let check_size = u32::try_from(adjusted_offset).unwrap_or(u32::MAX);
2232-
let base = builder
2233-
.ins()
2234-
.heap_addr(environ.pointer_type(), heap, addr, check_size);
2235-
2236-
// Native load/store instructions take a signed `Offset32` immediate, so adjust the base
2237-
// pointer if necessary.
2238-
let (addr, offset) = if offset > i32::MAX as u32 {
2239-
// Offset doesn't fit in the load/store instruction.
2240-
let adj = builder.ins().iadd_imm(base, i64::from(i32::MAX) + 1);
2241-
(adj, (offset - (i32::MAX as u32 + 1)) as i32)
2242-
} else {
2243-
(base, offset as i32)
2251+
let (addr, offset) = match u32::try_from(adjusted_offset) {
2252+
// If our adjusted offset fits within a u32, then we can place the
2253+
// entire offset into the offset of the `heap_addr` instruction. After
2254+
// the `heap_addr` instruction, though, we need to factor the the offset
2255+
// into the returned address. This is either an immediate to later
2256+
// memory instructions if the offset further fits within `i32`, or a
2257+
// manual add instruction otherwise.
2258+
//
2259+
// Note that native instructions take a signed offset hence the switch
2260+
// to i32. Note also the lack of overflow checking in the offset
2261+
// addition, which should be ok since if `heap_addr` passed we're
2262+
// guaranteed that this won't overflow.
2263+
Ok(adjusted_offset) => {
2264+
let base = builder
2265+
.ins()
2266+
.heap_addr(environ.pointer_type(), heap, addr, adjusted_offset);
2267+
match i32::try_from(memarg.offset) {
2268+
Ok(val) => (base, val),
2269+
Err(_) => {
2270+
let adj = builder.ins().iadd_imm(base, memarg.offset as i64);
2271+
(adj, 0)
2272+
}
2273+
}
2274+
}
2275+
2276+
// If the adjusted offset doesn't fit within a u32, then we can't pass
2277+
// the adjust sized to `heap_addr` raw.
2278+
//
2279+
// One reasonable question you might ask is "why not?". There's no
2280+
// fundamental reason why `heap_addr` *must* take a 32-bit offset. The
2281+
// reason this isn't done, though, is that blindly changing the offset
2282+
// to a 64-bit offset increases the size of the `InstructionData` enum
2283+
// in cranelift by 8 bytes (16 to 24). This can have significant
2284+
// performance implications so the conclusion when this was written was
2285+
// that we shouldn't do that.
2286+
//
2287+
// Without the ability to put the whole offset into the `heap_addr`
2288+
// instruction we need to fold the offset into the address itself with
2289+
// an unsigned addition. In doing so though we need to check for
2290+
// overflow because that would mean the address is out-of-bounds (wasm
2291+
// bounds checks happen on the effective 33 or 65 bit address once the
2292+
// offset is factored in).
2293+
//
2294+
// Once we have the effective address, offset already folded in, then
2295+
// `heap_addr` is used to verify that the address is indeed in-bounds.
2296+
// The access size of the `heap_addr` is what we were passed in from
2297+
// above.
2298+
//
2299+
// Note that this is generating what's likely to be at least two
2300+
// branches, one for the overflow and one for the bounds check itself.
2301+
// For now though that should hopefully be ok since 4gb+ offsets are
2302+
// relatively odd/rare. In the future if needed we can look into
2303+
// optimizing this more.
2304+
Err(_) => {
2305+
let index_type = builder.func.heaps[heap].index_type;
2306+
let offset = builder.ins().iconst(index_type, memarg.offset as i64);
2307+
let (addr, overflow) = builder.ins().iadd_ifcout(addr, offset);
2308+
builder.ins().trapif(
2309+
environ.unsigned_add_overflow_condition(),
2310+
overflow,
2311+
ir::TrapCode::HeapOutOfBounds,
2312+
);
2313+
let base = builder
2314+
.ins()
2315+
.heap_addr(environ.pointer_type(), heap, addr, access_size);
2316+
(base, 0)
2317+
}
22442318
};
22452319

22462320
// Note that we don't set `is_aligned` here, even if the load instruction's

cranelift/wasm/src/environ/dummy.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -652,6 +652,10 @@ impl<'dummy_environment> FuncEnvironment for DummyFuncEnvironment<'dummy_environ
652652
) -> WasmResult<ir::Value> {
653653
Ok(pos.ins().iconst(I32, 0))
654654
}
655+
656+
fn unsigned_add_overflow_condition(&self) -> ir::condcodes::IntCC {
657+
unimplemented!()
658+
}
655659
}
656660

657661
impl TargetEnvironment for DummyEnvironment {
@@ -792,7 +796,7 @@ impl<'data> ModuleEnvironment<'data> for DummyEnvironment {
792796
&mut self,
793797
_memory_index: MemoryIndex,
794798
_base: Option<GlobalIndex>,
795-
_offset: u32,
799+
_offset: u64,
796800
_data: &'data [u8],
797801
) -> WasmResult<()> {
798802
// We do nothing

cranelift/wasm/src/environ/spec.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -697,6 +697,10 @@ pub trait FuncEnvironment: TargetEnvironment {
697697
) -> WasmResult<()> {
698698
Ok(())
699699
}
700+
701+
/// Returns the target ISA's condition to check for unsigned addition
702+
/// overflowing.
703+
fn unsigned_add_overflow_condition(&self) -> ir::condcodes::IntCC;
700704
}
701705

702706
/// An object satisfying the `ModuleEnvironment` trait can be passed as argument to the
@@ -995,7 +999,7 @@ pub trait ModuleEnvironment<'data>: TargetEnvironment {
995999
&mut self,
9961000
memory_index: MemoryIndex,
9971001
base: Option<GlobalIndex>,
998-
offset: u32,
1002+
offset: u64,
9991003
data: &'data [u8],
10001004
) -> WasmResult<()>;
10011005

cranelift/wasm/src/sections_translator.rs

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,11 +54,11 @@ fn entity_type(
5454
}
5555

5656
fn memory(ty: MemoryType) -> Memory {
57-
assert!(!ty.memory64);
5857
Memory {
59-
minimum: ty.initial.try_into().unwrap(),
60-
maximum: ty.maximum.map(|i| i.try_into().unwrap()),
58+
minimum: ty.initial,
59+
maximum: ty.maximum,
6160
shared: ty.shared,
61+
memory64: ty.memory64,
6262
}
6363
}
6464

@@ -420,7 +420,8 @@ pub fn parse_data_section<'data>(
420420
} => {
421421
let mut init_expr_reader = init_expr.get_binary_reader();
422422
let (base, offset) = match init_expr_reader.read_operator()? {
423-
Operator::I32Const { value } => (None, value as u32),
423+
Operator::I32Const { value } => (None, value as u64),
424+
Operator::I64Const { value } => (None, value as u64),
424425
Operator::GlobalGet { global_index } => {
425426
(Some(GlobalIndex::from_u32(global_index)), 0)
426427
}

cranelift/wasm/src/translation_utils.rs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -226,11 +226,13 @@ pub enum TableElementType {
226226
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
227227
pub struct Memory {
228228
/// The minimum number of pages in the memory.
229-
pub minimum: u32,
229+
pub minimum: u64,
230230
/// The maximum number of pages in the memory.
231-
pub maximum: Option<u32>,
231+
pub maximum: Option<u64>,
232232
/// Whether the memory may be shared between multiple threads.
233233
pub shared: bool,
234+
/// Whether or not this is a 64-bit memory
235+
pub memory64: bool,
234236
}
235237

236238
/// WebAssembly event.

crates/c-api/include/doc-wasm.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -594,11 +594,17 @@
594594
*
595595
* The caller is responsible for deallocating the returned type.
596596
*
597+
* For compatibility with memory64 it's recommended to use
598+
* #wasmtime_memorytype_new instead.
599+
*
597600
* \fn const wasm_limits_t* wasm_memorytype_limits(const wasm_memorytype_t *);
598601
* \brief Returns the limits of this memory.
599602
*
600603
* The returned #wasm_limits_t is owned by the #wasm_memorytype_t parameter, the
601604
* caller should not deallocate it.
605+
*
606+
* For compatibility with memory64 it's recommended to use
607+
* #wasmtime_memorytype_maximum or #wasmtime_memorytype_minimum instead.
602608
*/
603609

604610
/**

0 commit comments

Comments
 (0)