Description
Overview
OpenSearch performance is crucial to the success of the software. Having a mechanism to measure the performance on daily basis and point out any degradation; and providing a standardized process to test the performance of new features are really important to keep the performance at high standards.
Problem Statement
The issue has two main aspects. Firstly, The lack of standardized process for contributors to follow in order to test the performance of their core features before releasing them (i.e. before merging the code or before move it out of experimental), such as Segment Replication GA. Secondly, The lack of automated tools to identify ad-hoc changes and small commits that may degrade the performance of the software, such as in the change on the GeoJson Point format that cause a degradation on point data indexing, which results to revert the change. The lack of proper processes often results in delayed feature releases and erodes confidence in our software.
Proposal
Core Features: Proactive mechanism to identify any performance degradation for core features ahead of time and before releasing the feature. We propose developing a public process for performance testing that can be followed by all contributors and the following are the set of requirements:
- The process should be designed for OpenSearch software and should be leveraged by any contributor.
- The process should provides templates for testing plan and step-by-step instructions to make testing easier and more standardized, those templates can be defined by use-cases, such as indexing, search, geospatial ...etc.
- The plan should provides the recommended workloads to be used from the existing list of workloads.
- The plan should provides set of metrics to measure the performance of the system by use-case, such as query response time, indexing throughput, CPU utilization ..etc.
- The plan should provides guidelines for testing different cluster configurations and settings, such as single/multi node cluster, low specs hardware, 100+ primary shards, with/without replicas.
- The process provides a template for a report summarizing the results, that shows the benchmarking comparison, issues or concerns that were discovered during testing and the analysis.
- The process provides the ability to customize the testing plan templates, the performance metrics and workloads based on feature requirements.
- Testing should uses OpenSearch Benchmark.
- The report should be reviewed by a group of maintainers to determine whether the testing results meet the criteria for sign-off.
In order to achieve the previously mentioned requirements, we propose to create new repository under the OpenSearch project for performance testing, the repo will have different templates for performance testing per use-case, users can submit new issue/PR to add new template that will cover missing use-cases. The repository will include templates to report the testing results. Owner of the feature will submit the testing report as PR and two or more of the repository’s maintainers should approve the PR in order to meet the criteria of sign-off. Initially, we will start with simple process that cover few use-cases, then we will evolve and improve it overtime as needed, based on feedback and changes to the software.
Ad-hoc Changes: Reactive mechanism to identify any commits that may cause a performance degradation and addressing them promptly. It is particularly effective in cases where we cannot anticipate the potential for performance degradation until the code merged. Additionally, this approach is beneficial in addressing small, unmeasurable slowdowns that may accumulate over time, suffering the fate of the boiling frog and result in an overall drop in the software’s performance.
In order to achieve it, we need a system to run a nightly benchmark that cover most common use-cases such as logging and geospatial, then it will generate a public dashboard with read-only access to review and compare against previous runs. This effort is already in progress that you can track it here, which is similar to Lucene’s nightly benchmarks. After completing the foundation of the nightly benchmarks, there may be opportunities for further enhancements, such as:
- Notifications: The system should track and detect any performance degrading comparing to previous runs or comparing to a baseline, then it will create an auto-cut github issue to the corresponding repository, instead of depending on maintainers to monitor and identify those issues manually.
- Profiling: The system should generate a flame graph profile report for each run. So, contributors can investigate any potential performance issues easily.
Overtime we should keep enriching our nightly benchmarks with more test-cases to increase the coverage. Despite our efforts to expand the coverage of the nightly benchmarks, it will stay limited to a certain number of workloads and use-cases. However, by developing the additional mechanism to test and report the performance of new features, we will make sure to keep the software’s performance at high bar and users will have more confidence to upgrade and adopt new features.
We are looking forward to your feedback and support for this proposal.
Related Issues:
opensearch-project/opensearch-benchmark#102
#3983
References:
https://blog.mikemccandless.com/2011/04/catching-slowdowns-in-lucene.html
https://webtide.com/the-jetty-performance-effort/
Metadata
Assignees
Labels
Type
Projects
Status
Closed