[Searchable Snapshot] Capture initial CPU profiling information #5217

andrross · 2022-11-11T20:28:15Z

Performance is known to be poor (see #4797) until we implement caching and other optimizations. The task here is to capture an initial profile of the query performance, and document the steps for doing so, with the goal that we'll be able to re-run the profile to measure and validate improvements as we implement the optimizations.

tlfeng · 2022-11-16T18:15:43Z

The steps to capture the profile of the query performance for Searchable Snapshot.

Environment preparation

Set up an OpenSearch 2.4 node manually.
Enable Searchable Snapshot feature, by adding -Dopensearch.experimental.feature.searchable_snapshot.enabled=true into jvm.options file. Optionally, increase the JVM heap size in the same file.
Install the "repository-s3" plugin, by running bin/opensearch-plugin install repository-s3.
Start the OpenSearch node.
Make the OpenSearch node has the full index data and snapshot in Amazon S3. Running the workload in the Pull Request Add a test procedure in nyc_taxis workload for measuring performance for Searchable Snapshot feature opensearch-benchmark-workloads#58, which is a modified version of "nyc_taxis" workload with Searchable Snapshot feature involved.
The command to run the workload, for example, is opensearch-benchmark execute_test --workload-path=~/github/opensearch-benchmark-workloads/searchable_snapshot --target-hosts=127.0.0.1:9200 --workload-params="~/benchmark-workload/searchable_snapshot/workload_params.json" --exclude-tasks="default,range,distance_amount_agg,autohisto_agg,date_autohisto_agg

Capture profile information

Java Flight Recorder (JFR) is used to achieve the goal.
The official guide can be referred to start a flight recording. As mentioned in the link, either using the GUI tool - Java Mission Control or the CLI command can start the recording.

Using Java Mission Control

Download a distribution of Java Mission Control, such as https://adoptium.net/jmc/.
To start a new recording, right click the JVM you want to record on and select "Start Flight Recording".
Configure a time fixed recording. Set the time duration of the recording properly, make sure it covers the full search query.
For example, I set the recording time for "range" query operation is 8 minutes, and "autohisto_agg" or "date_histogram_agg" operation is 15 minutes, because the "range" query takes 6 minutes in my environment, and "autohisto_agg" takes 13 minutes, "date_histogram_agg" takes 14 minutes (as mentioned in [BUG][Searchable Snapshot] Failed to run 'range' query in a large index with error "SearchPhaseExecutionException: all shards failed" through OpenSearch-Benchmark #5172 (comment)).
For the event setting, choose "Profiling - on server", instead of "Continuous - on server".
After started the recording, ran the command to execute a search query that suitable for "nyc_taxis" workload.

Using CLI command:
(to be completed)

Result

I ran the curl commands for three range queries which used in "nyc_taxis" workload (mentioned in #5172 (comment)). The below are the flight recording results.
Command 1

$ curl -XGET http://localhost:9200/nyc_taxis/_search\?pretty --header 'Content-Type: application/json' -d '{
"query": {        
          "range": {
            "total_amount": {
              "gte": 5,          
              "lt": 15            
            }                     
          }                         
        }            
}'

Result 1
flight_recording_110141orgopensearchbootstrapOpenSearch143075 nyc_taxis range.jfr.zip
flight_recording_110141orgopensearchbootstrapOpenSearch63515.range.query.jfr.zip

Command 2

curl -XGET http://localhost:9200/nyc_taxis/_search\?pretty --header 'Content-Type: application/json' -d '{
  "size": 0,
  "query": {
    "range": {
      "dropoff_datetime": {
        "gte": "01/01/2015",
        "lte": "21/01/2015",
        "format": "dd/MM/yyyy"
      }
    }
  },
  "aggs": {
    "dropoffs_over_time": {
      "auto_date_histogram": {
        "field": "dropoff_datetime",
        "buckets": 20
      }
    }
  }
}'

Result 2
flight_recording_110141orgopensearchbootstrapOpenSearch143075 nyc-taxis autohisto_agg.jfr.zip

Command 3

curl -XGET http://localhost:9200/nyc_taxis/_search\?pretty --header 'Content-Type: application/json' -d '{
  "size": 0,
  "query": {
    "range": {
        "dropoff_datetime": {
        "gte": "01/01/2015",
        "lte": "21/01/2015",
        "format": "dd/MM/yyyy"
      }
    }
  },
  "aggs": {
    "dropoffs_over_time": {
      "date_histogram": {
        "field": "dropoff_datetime",
        "calendar_interval": "day"
      }
    }
  }      
}'

Result 3
flight_recording_110141orgopensearchbootstrapOpenSearch143075 nyc-taxis date_autohisto_agg.jfr.zip

The results can be viewed in Java Mission Control.

andrross · 2022-11-17T23:52:06Z

As expected the profile is dominated by S3BlobContainer.readBlob. There are other issues here as well but probably not worth diving into until we have implemented caching to reduce the need to re-fetch data from the remote repository.

tlfeng · 2022-12-12T18:05:28Z

Close the issue, because an initial CPU profiling info has been captured and the step to get the result has been written down.

andrross added enhancement Enhancement or improvement to existing feature or request benchmarking Issues related to benchmarking or performance. distributed framework labels Nov 11, 2022

andrross mentioned this issue Nov 11, 2022

[Meta] Promote searchable snapshot out of experimental #5087

Closed

12 tasks

andrross changed the title ~~[Searchable Snapshot] Capture initial profiling information~~ [Searchable Snapshot] Capture initial CPU profiling information Nov 11, 2022

tlfeng self-assigned this Nov 14, 2022

tlfeng closed this as completed Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Searchable Snapshot] Capture initial CPU profiling information #5217

[Searchable Snapshot] Capture initial CPU profiling information #5217

andrross commented Nov 11, 2022

tlfeng commented Nov 16, 2022 •

edited

Loading

andrross commented Nov 17, 2022

tlfeng commented Dec 12, 2022

[Searchable Snapshot] Capture initial CPU profiling information #5217

[Searchable Snapshot] Capture initial CPU profiling information #5217

Comments

andrross commented Nov 11, 2022

tlfeng commented Nov 16, 2022 • edited Loading

Environment preparation

Capture profile information

Result

andrross commented Nov 17, 2022

tlfeng commented Dec 12, 2022

tlfeng commented Nov 16, 2022 •

edited

Loading