Benchmarking end to end transaction throughput performance

# Benchmarking end to end transaction throughput performance

It would be nice for substrate to have a setup/demonstration/documentation for of E2E benchmarks.
There are plenty of benchmarks in the codebase and a runtime benchmarking framework,
however we were not able to find E2E benchmarks demonstrating throughput of the network.

Furthermore, our benchmarks show peak throughput at around 800 transactions per second,
which is a bit lower than [claimed 1000 TPS](https://twitter.com/gavofyork/status/1255859146127179782).
We would like to know where the discrepancy comes from, how to achieve higher throughput
and learn how to analyze Substrate's performance.

## Setup

Since we haven't been able to find the setup for E2E benchmarks, we've implemented the following setup:
- 4x AWS t3.xlarge\* instances, each running a substrate node in the same network
- 1x custom client node that creates transactions over HTTP RPC, evenly distributing between nodes
- 1x Prometheus server collecting stats from substrate nodes

\* t3.xlarge were used because they supposedly meet the server requirements
## Possible limits

Substrate has built-in limits that we suppose ensure a smooth operation of substrate nodes.
However, to find the limits we would like to disable those limits.

We have found the following limits:
- Maximum block weight
- Maximum block byte length
- Transaction pool queue limit

They were increased with [this change to substrate-node-template](https://gist.github.com/imlvts/08e64893d76536b163a3bd7a78cf2f3b).

## Observations

- The peak transaction throughput measured at 800 TPS.
- Increasing the client TPS above that *decreased* the number of transactions in block, meaning more transactions were dropped as the load increased.
- CPU utilization during this benchmark was ~50% on substrate nodes and ~10% on the client node.  There is still room for compute to spare.
- The client was capable of generating up to 3000 TPS when used on 12-node network.  However, most transactions were dropped.  This demonstrates that the client was not the bottleneck.
- We were not able to find where substrate node spends its CPU time. This is complicated by the use of async, and lack of debug info by default. Any suggestions on how to collect performance info (a la [flamegraphs](https://www.brendangregg.com/flamegraphs.html)) would be greatly appreciated.

We would like to know if we're missing anything from this approach, whether these results are reasonable and if you have any suggestions on how to evaluate end to end substrate performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarking end to end transaction throughput performance #396

Benchmarking end to end transaction throughput performance

Setup

Possible limits

Observations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmarking end to end transaction throughput performance #396

Description

Benchmarking end to end transaction throughput performance

Setup

Possible limits

Observations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions