This directory is for repeatable benchmark workflow around Celerity.
The benchmark binaries themselves stay in src/bin/ so Cargo can build and run them normally:
local_latremote_latlocal_thrremote_thr
The point of these benchmarks is not to produce flattering numbers. The point is to make it easy to compare Celerity and libzmq under the same conditions.
For throughput, the intended comparison is:
- Celerity
local_thr/remote_thr - libzmq
local_thr/remote_thr
For latency, the intended comparison is:
- Celerity
local_lat/remote_lat - libzmq
local_lat/remote_lat
Run the receiver first, then the sender.
Use the same message count for each size unless you have a reason to change it.
12481632641282565121024204840968192163843276865536
The current defaults used in this repo are:
- Throughput:
1000000messages - Latency:
10000round trips
Terminal 1:
cargo run --release --features tokio --bin local_thr -- tcp://127.0.0.1:5555 1024 1000000Terminal 2:
cargo run --release --features tokio --bin remote_thr -- tcp://127.0.0.1:5555 1024 1000000local_thr is the side that reports the benchmark result. remote_thr just pushes the traffic.
Terminal 1:
cargo run --release --features tokio --bin local_lat -- tcp://127.0.0.1:5555 1024 10000Terminal 2:
cargo run --release --features tokio --bin remote_lat -- tcp://127.0.0.1:5555 1024 10000remote_lat is the side that reports the benchmark result. It performs one warmup round trip and then times the remaining request/reply loop.
Terminal 1:
./perf/local_thr tcp://127.0.0.1:5555 1024 1000000Terminal 2:
./perf/remote_thr tcp://127.0.0.1:5555 1024 1000000The tables below record the current matched-count loopback comparison between:
- Celerity from this repo
- libzmq built locally
These results were collected with:
- Throughput:
COUNT=1000000 ./perf/run_thr.sh - Latency:
COUNT=10000 ./perf/run_lat.sh - Matching libzmq perf binaries with the same endpoint, sizes, and counts
The tables below show the key comparison sizes used for quick baseline checks.
| Size | libzmq | Celerity | Gap |
|---|---|---|---|
| 1B | 58.96 us | 103.73 us | 1.76x |
| 64B | 53.62 us | 88.36 us | 1.65x |
| 1KB | 56.14 us | 98.83 us | 1.76x |
| 64KB | 140.83 us | 197.30 us | 1.40x |
| Size | libzmq | Celerity | Gap |
|---|---|---|---|
| 1B | 4,393,364 msg/s | 865,874 msg/s | 5.07x |
| 64B | 3,564,719 msg/s | 787,369 msg/s | 4.53x |
| 1KB | 896,963 msg/s | 738,564 msg/s | 1.21x |
| 4KB | 247,479 msg/s | 411,541 msg/s | Celerity 1.66x faster |
| 64KB | 11,619 msg/s | 23,519 msg/s | Celerity 2.02x faster |
Notes:
- Small-message throughput is still where libzmq leads most clearly.
- At
4KBand64KB, the current Celerity transport path is faster than the local libzmq build on this machine. - These numbers are machine-specific and should be treated as a baseline, not a guarantee.
- Use
--release. - Run on the same machine for both Celerity and libzmq.
- Keep the endpoint, message size, and message count identical.
- Run each case at least 3 times.
- Compare median-to-median or best-of-3 to best-of-3, but stay consistent.
- Record the receiver-side throughput, not the sender-side timing.
- Record latency from the requester side, not the responder side.
| implementation | transport | size (bytes) | count | msg/s | MiB/s | notes |
|---|---|---|---|---|---|---|
| celerity | tcp loopback | 1024 | 100000 | |||
| libzmq | tcp loopback | 1024 | 100000 |
| implementation | transport | size (bytes) | round trips | avg latency (us) | notes |
|---|---|---|---|---|---|
| celerity | tcp loopback | 1024 | 100000 | ||
| libzmq | tcp loopback | 1024 | 100000 |
There is a helper script in this directory:
./perf/run_thr.shIt runs the Celerity throughput pair for the default size grid, prints a compact summary, and writes:
perf/celerity_thr_results.csv
There is also a latency runner:
./perf/run_lat.shIt prints the latency summary and writes:
perf/celerity_lat_results.csv
Both scripts accept CSV_PATH=... if you want to override the output location.