[Discuss] Performance benchmarking improvements for Opensearch

Currently we have a very basic performance test suite([link](https://github.com/opensearch-project/opensearch-build/blob/main/jenkins/opensearch/perf-test.jenkinsfile#L21 )) where we execute a single workload `nyc_taxis` on a single node cluster and capture the metrics. I wanted to open a discussion for process improvements in benchmarking Opensearch(periodically as well as during every release). This would help in a more through benchmarking and ensuring that we don't miss out on any regression.

Listing down few high level improvements that i can think of. Feel free to add more test scenarios.

**1. Testing different cluster configurations**
We should also cover different cluster configurations(multi-node clusters, with/without replicas(logical/physical), Multi-AZ configurations, Instance types varying compute, memory and storage(EBS/SSD). 

**2. Testing with different workloads**
Existing list of workloads are mentioned [here](https://github.com/opensearch-project/opensearch-benchmark-workloads). 
We should add different types of workload to simulate different traffic types like: 
- `geonames` for structured data.
- `pmc` for full text search.
- `nested` for nested documents. 

Apart from the existing workloads, we need workloads with higher volume of data(highest is `nyc_taxis` with 75 GB approx.). [Here](https://github.com/opensearch-project/opensearch-benchmark-workloads/issues/43) is an existing issue on Opensearch-benchmark for the same. Workloads like these would definitely help benchmarking  larger clusters (like 100 nodes!!) which reflect real workload of biggest consumers of Opensearch.

**3. Benchmarking other usecase(core or plugins)**
Apart from search and indexing, we also need benchmarks for other features which are present in core or external plugins. Few examples are: 
- Snapshots.
- Reindexing.
- Security plugin.
- Cross cluster search/replication.
- Remote reindex.
- Async search.
- SQL.
- Index management.
- Segment Replication.
- Remote store.
- Pluggable Translog.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discuss] Performance benchmarking improvements for Opensearch #3983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development