Description
We got reports from multiple network operators that after an upgrade to CosmWasm 1.4 or 1.5 the memory usage increases a lot over time. This is clearly a bug in CosmWasm for which at the point of writing there is no fix. However, there are good mitigation strategies which I'll elaborate in here.
What's happening
When you run a node with wasmvm 1.4 or 1.5, the memory usage of the process increases over time. The memory usage profile looks like this:
You might see also experiences the consequences such as:
- Node unable to stay in sync with the network because swap is used and the operation is getting too slow
- Node crashing because it cannot allocate memory. This might e.g. lead to crashes in the Go space or aborts in the Rust code like here:
SIGABRT: abort PC=0x2b998f1 m=9 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 10416 [syscall]: runtime.cgocall(0x2121300, 0xc00a78ef58) runtime/cgocall.go:157 +0x4b fp=0xc00a78ef30 sp=0xc00a78eef8 pc=0x456f0b github.com/CosmWasm/wasmvm/internal/api._C2func_save_wasm(0x7f2abb664810, {0x0, 0xc00a8d0000, 0x69ab6}, 0x0, 0xc005400ca0) _cgo_gotypes.go:662 +0x65 fp=0xc00a78ef58 sp=0xc00a78ef30 pc=0x135e865 github.com/CosmWasm/wasmvm/internal/api.StoreCode.func1({0x54c9da0?}, {0xd8?, 0xc00a8d0000?, 0x0?}, 0x0?) github.com/CosmWasm/wasmvm@v1.5.0/internal/api/lib.go:65 +0x97 fp=0xc00a78eff0 sp=0xc00a78ef58 pc=0x13618f7 github.com/CosmWasm/wasmvm/internal/api.StoreCode({0x1?}, {0xc00a8d0000?, 0x0?, 0x14?})
Why it is happening
Every time you load a contract from the file system cache, the memory usage increase (this is the bug). If contracts kick out each other from the in-memory cache, this happens often. If the cache is large enough to hold the majority of actively used contracts, this happens very rarely.
Workaround
To mitigate the problem, increase the config wasm.memory_cache_size
in app.toml from 100 MiB to a much larger value depending on the network such as e.g. 2000 MiB:
[wasm]
# other wasm config entries
memory_cache_size = 2000 # MiB
This is a per-node configuration and needs to be done on every node.
How lage should the cache be?
This depends on the usage patterns of the network and the size of the compiled modules. Being able to store all contracts in memory would be one extreme that might make sense for permissioned CosmWasm chains. Permissionless chains are likely to have contracts that are almost never used.
To get a rough idea of the oder of magnitude, you can check the size of the modules using something like this:
- CosmWasm 1.3:
du -hs ~/.myd/wasm/wasm/cache/modules/v6-*
- CosmWasm 1.4:
du -hs ~/.myd/wasm/wasm/cache/modules/v7-*
- CosmWasm 1.5:
du -hs ~/.myd/wasm/wasm/cache/modules/v8-*
Complementary strategies
The above setting is the most important thing. But there is more you can do, like
- Increase memory
- Observe memory usage. The sympthoms are different for every blockchain and every node.
- Consider memory usage alerting
- Enable swap to avoid immediate hard crashes in case of overusage
- Schedule clean node restarts from time to time
Overall bear in mind I am not a node operator and I don't know the specifics of your blockchain or system. So I cannot make complete and final recommendations.
The bug
The bug can be reproducted locally in a pure-Rust example using heap profiling shown in #1955. The tools shows us that the memory usage increases over time but is almost zero when the process is ending cleanly. This means this is not a memory leak but rather an undesired memory usage pattern.

This is where the allocations are made. At max memory usage time (t-gmax), 96% are coming through cosmwasm_vm::modules::file_system_cache::FileSystemCache::load
.
At this point it is not clear to me if this is a bug in Wasmer, rkyv or cosmwasm-vm.