Description
running certain tests with --run-count > 1 causes redis server to crash due to out of memory
To reproduce:
Start the redis server:
taskset -c 95 ./src/redis-server --port 6379 --logfile server.log --save ""
Run a test which loads data multiple times:
taskset -c 0,1,2,3 /usr/local/bin/memtier_benchmark --port 6379 --server localhost --json-out-file oss-standalone-2023-01-24-15-30-30-NA-memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.json "--pipeline" "10" "--data-size" "100" --command "XADD key * field data field data field data field data field data" --command-key-pattern="P" --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 50 -t 4 --hide-histogram --run-count=10
Fails with :
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/redis_benchmarks_specification/runner/runner.py", line 722, in process_self_contained_coordinator_stream
used_memory_check(
File "/usr/local/lib/python3.8/dist-packages/redis_benchmarks_specification/runner/runner.py", line 915, in used_memory_check
exit(1)
File "/usr/lib/python3.8/_sitebuiltins.py", line 26, in call
raise SystemExit(code)
SystemExit: 1
This error is printed out when we exceed the server memory capacity configuration. However this is not the error that crashes the redis server. But if we run with --run-count=10, then the Redis server itself crashes and that prevents the rest of the tests from completing.
In my testing a single run generated ~ 27 GB of data in memory on CascadeLake and about 34GB of memory on IcerLake as reported by: ./redis-cli info | grep used_memory_human
A few runs will easily generate hundres of GB of data in memory.
Propose:
- Benchmark all load heavy tests and adjust the required server memory to match.
- If we specify run-count >1, then we need to increase the required server memory by multiplying it to the run count and ensure we have enough.
./redis-cli info | grep used_memory_peak_human
./redis-cli info | grep total_system_memory_human