Skip to content

Performance Testing

Dmytro Vyazelenko edited this page Jul 17, 2024 · 65 revisions

Aeron is designed to provide high-throughput and low-latency message transport from publishers to subscribers.

A number of tests are provided in the samples module to test both latency and throughput.

Comparisons of Aeron with other algorithms for RTT IPC latency can be found here.

A guide for running tests on AWS can be found here.

Throughput Testing

System Preparation

For some context, Todd co-authored a paper in a previous life.

Adjust socket limits:

Linux

$ sudo sysctl net.core.rmem_max=2097152
$ sudo sysctl net.core.wmem_max=2097152

FreeBSD / Darwin

$ sudo sysctl -w kern.ipc.maxsockbuf=4194304
$ sudo sysctl -w net.inet.tcp.sendspace=2097152 
$ sudo sysctl -w net.inet.tcp.recvspace=2097152

Note: Make sure that the Aeron directory is created on a RAM disk. See Best-Practices-Guide#macdarwin on how to do that.

Windows

Due to the higher system call overhead with Windows it helps to use larger socket buffers than on Linux, e.g. try 2-4x larger. As Windows does not have a /dev/shm is necessary to install a RAM disk such as http://www.radeonramdisk.com/ for the Aeron directory. A RAM disk will avoid the disk write latency for the memory mapped files used to communicate between the clients and driver.

Explanation of configuration options:

  • -XX:UseBiasedLocking: The driver has no contended locks so can benefit from avoiding the CAS operations to take a lock.
  • -XX:BiasedLockingStartupDelay=0: The Aeron driver can easily be running before the default delay of biased locking is passed.
  • -XX:+UnlockDiagnosticVMOptions -XX:GuaranteedSafepointInterval=300000: To reduce the frequency of the JVM bringing all threads to a safepoint.
  • -XX:+UseParallelOldGC: Use parallel garbage collection for the full collections.
  • -Djava.net.preferIPv4Stack=true: The IPv4 stack can be more a efficient path than IPv6 within the Java JNI implementation.
  • -Daeron.mtu.length=8k: Increase the size of the maximum transmission unit to reduce system calls in a throughput scenario.
  • -Daeron.socket.so_sndbuf=2m: Increase the size of OS socket send buffer (SO_SNDBUF) to account for Bandwidth Delay Product (BDP) on a high bandwidth network.
  • -Daeron.socket.so_rcvbuf=2m: Increase the size of OS socket receive buffer (SO_RCVBUF) to account for Bandwidth Delay Product (BDP) on a high bandwidth network.
  • -Daeron.rcv.initial.window.length=2m: Set the initial window for flow control to account for BDP.
  • -Daeron.term.buffer.sparse.file=false: Do not use sparse files for the term buffers to avoid page faults.
  • -Daeron.pre.touch.mapped.memory=true: Pre-touch memory mapped files to fault the pages into client processes.
  • -Dagrona.disable.bounds.checks=true: Disable bounds checking to reduce instruction path on private secure networks.

Execution

Sample scripts are available to make the following more convenient in the aeron-samples module.

Run the media driver:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -XX:+UnlockDiagnosticVMOptions \
    -XX:GuaranteedSafepointInterval=300000 \
    -XX:+UseBiasedLocking \
    -XX:BiasedLockingStartupDelay=0 \
    -XX:+UseParallelOldGC \
    -Djava.net.preferIPv4Stack=true \
    -Daeron.mtu.length=8k \
    -Daeron.socket.so_sndbuf=2m \
    -Daeron.socket.so_rcvbuf=2m \
    -Daeron.rcv.initial.window.length=2m \
    -Dagrona.disable.bounds.checks=true \
    -Daeron.term.buffer.sparse.file=false \
    -Daeron.pre.touch.mapped.memory=true \
    io.aeron.samples.LowLatencyMediaDriver

Run the Subscriber:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -XX:+UseParallelOldGC \
    -Dagrona.disable.bounds.checks=true \
    -Daeron.sample.frameCountLimit=256 \
    io.aeron.samples.RateSubscriber

Run the Publisher:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -XX:+UseParallelOldGC \
    -Daeron.sample.messageLength=32 \
    -Daeron.sample.messages=500000000 \
    -Dagrona.disable.bounds.checks=true \
    io.aeron.samples.StreamingPublisher

IPC throughput via Shared Memory that bypasses the network:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -Dagrona.disable.bounds.checks=true \
    -Daeron.sample.messageLength=32 \
    io.aeron.samples.EmbeddedIpcThroughput

IPC throughput via Shared Memory that bypasses the network and uses exclusive publications:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -Dagrona.disable.bounds.checks=true \
    -Daeron.sample.messageLength=32 \
    io.aeron.samples.EmbeddedExclusiveIpcThroughput

Latency Testing

System Preparation

Currently no specific changes required.

Execution

Run the media driver:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -XX:+UnlockDiagnosticVMOptions \
    -XX:GuaranteedSafepointInterval=300000 \
    -XX:+UseBiasedLocking \
    -XX:BiasedLockingStartupDelay=0 \
    -XX:+UseParallelOldGC \
    -Djava.net.preferIPv4Stack=true \
    -Dagrona.disable.bounds.checks=true \
    io.aeron.samples.LowLatencyMediaDriver

Run the Subscriber:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -XX:+UnlockDiagnosticVMOptions \
    -XX:GuaranteedSafepointInterval=300000 \
    -XX:+UseParallelOldGC \
    -Daeron.pre.touch.mapped.memory=true \
    -Dagrona.disable.bounds.checks=true \
    io.aeron.samples.Pong

Run the Publisher:

$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
    -XX:+UnlockDiagnosticVMOptions \
    -XX:GuaranteedSafepointInterval=300000 \
    -XX:+UseParallelOldGC \
    -Daeron.sample.messages=100000 \
    -Daeron.sample.messageLength=32 \
    -Daeron.pre.touch.mapped.memory=true \
    -Dagrona.disable.bounds.checks=true \
    io.aeron.samples.Ping

Archive Performance

Aeron supports the recording and replay of live streams from persistent storage. Samples to test performance can give a good feel for what your hardware is capable of. Further details of the Aeron Archive can be found on wiki.

Profiling

Native Code Bounds Checking

With the Java Driver it is possible to disable bounds checking by using a system property, however with the C driver and clients it is not as straight forward. If a user needs the extra performance boost and is willing to take the associated risk, the the DISABLE_BOUNDS_CHECK define can be used at compile time. This is not set by default.