Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search Pipelines] Base Benchmarks #7782

Closed
macohen opened this issue May 26, 2023 · 5 comments
Closed

[Search Pipelines] Base Benchmarks #7782

macohen opened this issue May 26, 2023 · 5 comments
Assignees
Labels
Search Search query, autocomplete ...etc v2.10.0

Comments

@macohen
Copy link
Contributor

macohen commented May 26, 2023

Benchmarking is a good way to measure how changes to our code impact performance over time. This RFC should cover benchmarking for Search Pipelines and processors that will be included in the release.

Use the OpenSearch Benchmarking Tools to benchmark a very simple search (benchmark pipelines, not search):

  • using the _search endpoint to baseline
  • using a search pipeline with no processors (how much overhead for search pipelines vs _search alone)
  • for each processor, benchmark a pipeline with only that processor in use. (how much overhead for each processor)
  • for all processors available in core, benchmark a pipeline with all of them in use (this may be a little silly if we have 12 reranking processors), but if we think of it as eXtreme Search Pipelining it will sound cooler.
  • encourage, but do not require processors that are not part of the OpenSearch Project release to benchmark their processors, as well.

What can we do in the short term?
Do these make sense for benchmarks? What else would we want to measure?
How do we set up the OpenSearch Benchmarking tools for this?

@macohen macohen converted this from a draft issue May 26, 2023
@macohen macohen added Search Search query, autocomplete ...etc and removed untriaged labels May 26, 2023
@msfroh
Copy link
Collaborator

msfroh commented May 30, 2023

using a search pipeline with no processors (how much overhead for search pipelines vs _search alone)

These are identical -- they follow the same code path. No search pipeline uses the "no-op" pipeline (which is a pipeline without processors).

@msfroh msfroh moved this to Next (Next Quarter) in Search Project Board May 30, 2023
@macohen macohen moved this from Next (Next Quarter) to Now(This Quarter) in Search Project Board Jun 12, 2023
@noCharger
Copy link
Contributor

Search Pipelines and Processors Benchmarking Plan

Goal

The primary goal of this benchmark is to evaluate the performance impact of the various search pipelines and processors available in the OpenSearch Project release. We will measure this by comparing the performance of the _search endpoint with pipelines and processors of various complexities.

General Assumptions

  • All tests are performed in a controlled and isolated environment with the same hardware specifications, avoiding external factors that may affect the benchmark results.
  • Tests are conducted using the same set of data to maintain consistency.
  • The performance measurements focus on the time taken to process the queries, but other metrics such as CPU usage and memory consumption can also be considered.

Benchmarks

The following table outlines the tests we plan to conduct, their features, and any notes related to them:

Feature Test Case Notes
Baseline Use the _search endpoint alone without any pipelines or processors. This will provide the baseline for comparison.
Pipeline without Processors Use a search pipeline without any processors. Measure the overhead for no-op pipeline alone.
Single Processor For each processor, create a pipeline with only that processor in use. Also query with the same processor via _search request as ad-hoc. Measure the overhead for each processor.
All Core Processors Create a pipeline with all available processors in use. Also query with the same processor via _search request as ad-hoc. An extreme test to measure overall impact.

Future Work

While this benchmarking plan provides a good starting point, we will need to iterate and refine it based on our findings. The results of these tests will help us understand how different search pipelines and processors impact the overall performance of our system and guide us in future optimization efforts.

@noCharger noCharger self-assigned this Jun 22, 2023
@noCharger noCharger added the v2.9.0 'Issues and PRs related to version v2.9.0' label Jul 3, 2023
@noCharger noCharger moved this from Now(This Quarter) to 🏗 In progress in Search Project Board Jul 8, 2023
@noCharger noCharger added v2.10.0 and removed v2.9.0 'Issues and PRs related to version v2.9.0' labels Jul 11, 2023
@noCharger
Copy link
Contributor

For 2.9 we will have the data around. For 2.10, we are planning to have the benchmark dashboard integrated.

@noCharger noCharger moved this from 🏗 In progress to Now(This Quarter) in Search Project Board Jul 17, 2023
@mingshl mingshl moved this from Now(This Quarter) to 🏗 In progress in Search Project Board Aug 14, 2023
@noCharger
Copy link
Contributor

noCharger commented Aug 15, 2023

Optional:

There are two main effort:

@noCharger
Copy link
Contributor

Close this issue since all tasks are merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Search Search query, autocomplete ...etc v2.10.0
Projects
Archived in project
Development

No branches or pull requests

3 participants