Closed
Description
Describe the bug
We're seeing intermittent failures in our Github Actions that leverage OSB when executing against Elasticsearch 7.10 with OSB 1.8.0
2024-09-09 16:08:17 - INFO - Running opensearch-benchmark with 'nyc_taxis' workload
2024-09-09 16:08:17 - INFO - Executing command: opensearch-benchmark execute-test --distribution-version=1.0.0 --target-host=https://capture-proxy:9200 --workload=nyc_taxis --pipeline=benchmark-only --test-mode --kill-running-processes --workload-params=target_throughput:0.5,bulk_size:10,bulk_indexing_clients:1,search_clients:1 --client-options=verify_certs:false,basic_auth_user:admin,basic_auth_password:********
____ _____ __ ____ __ __
/ __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__
/ / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_|
/_/
[INFO] [Test Execution ID]: 6ee4f874-7bc4-4968-b1b7-192c1a54916a
[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
Error: Cannot execute-test. Worker [0] has exited prematurely.
Getting further help:
*********************
* Check the log files in /root/.benchmark/logs for errors.
* Read the documentation at https://opensearch.org/docs.
* Ask a question on the forum at https://forum.opensearch.org/.
* Raise an issue at https://github.com/opensearch-project/OpenSearch-Benchmark/issues and include the log files in /root/.benchmark/logs.
To reproduce
Running OSB with the following settings. Occasionally seeing the last execution fail with the logs attached
benchmark.log
pipenv run opensearch-benchmark execute-test --distribution-version=1.0.0 --target-host=$endpoint --workload=geonames --pipeline=benchmark-only --test-mode --kill-running-processes --workload-params "target_throughput:0.5,bulk_size:10,bulk_indexing_clients:1,search_clients:1" --client-options=$client_options &&
echo "Running opensearch-benchmark w/ 'http_logs' workload..." &&
pipenv run opensearch-benchmark execute-test --distribution-version=1.0.0 --target-host=$endpoint --workload=http_logs --pipeline=benchmark-only --test-mode --kill-running-processes --workload-params "target_throughput:0.5,bulk_size:10,bulk_indexing_clients:1,search_clients:1" --client-options=$client_options &&
echo "Running opensearch-benchmark w/ 'nested' workload..." &&
pipenv run opensearch-benchmark execute-test --distribution-version=1.0.0 --target-host=$endpoint --workload=nested --pipeline=benchmark-only --test-mode --kill-running-processes --workload-params "target_throughput:0.5,bulk_size:10,bulk_indexing_clients:1,search_clients:1" --client-options=$client_options &&
echo "Running opensearch-benchmark w/ 'nyc_taxis' workload..." &&
pipenv run opensearch-benchmark execute-test --distribution-version=1.0.0 --target-host=$endpoint --workload=nyc_taxis --pipeline=benchmark-only --test-mode --kill-running-processes --workload-params "target_throughput:0.5,bulk_size:10,bulk_indexing_clients:1,search_clients:1" --client-options=$client_options
Expected behavior
OSB succeeds
Screenshots
If applicable, add screenshots to help explain your problem.
Host / Environment
Github actions - ubuntu
opensearch-benchmark 1.8.0
Additional context
No response
Relevant log output
2024-09-09 16:08:20,381 ActorAddr-(T|:36659)/PID:761 osbenchmark.actor INFO Worker[0] is executing tasks at index [3].
2024-09-09 16:08:20,395 -not-actor-/PID:701 osbenchmark.test_execution_orchestrator ERROR A benchmark failure has occurred
2024-09-09 16:08:20,396 -not-actor-/PID:701 osbenchmark.test_execution_orchestrator INFO Telling benchmark actor to exit.
2024-09-09 16:08:20,383 ActorAddr-(T|:36659)/PID:761 osbenchmark.client INFO Creating OpenSearch client connected to [{'host': 'capture-proxy', 'port': 9200, 'use_ssl': True}] with options [{'verify_certs': False, 'basic_auth_user': 'admin', 'basic_auth_password': '*****', 'max_connections': 1}]
2024-09-09 16:08:20,398 ActorAddr-(T|:43115)/PID:729 osbenchmark.actor INFO BuilderActor#receiveMessage unrecognized(msg = [<class 'thespian.actors.ActorExitRequest'>] sender = [ActorAddr-(T|:34845)])
2024-09-09 16:08:20,392 ActorAddr-(T|:34845)/PID:710 osbenchmark.actor INFO Received a benchmark failure from [ActorAddr-(T|:37323)] and will forward it now.
2024-09-09 16:08:20,391 ActorAddr-(T|:37323)/PID:730 osbenchmark.actor ERROR Worker [0] has exited prematurely. Aborting benchmark.
2024-09-09 16:08:20,383 ActorAddr-(T|:36659)/PID:761 osbenchmark.client INFO SSL support: off
2024-09-09 16:08:20,383 ActorAddr-(T|:36659)/PID:761 osbenchmark.client INFO HTTP basic authentication: on
2024-09-09 16:08:20,384 ActorAddr-(T|:36659)/PID:761 osbenchmark.client INFO HTTP compression: off
2024-09-09 16:08:20,384 ActorAddr-(T|:36659)/PID:761 osbenchmark.worker_coordinator.worker_coordinator INFO Task assertions enabled: False
2024-09-09 16:08:20,385 ActorAddr-(T|:36659)/PID:761 osbenchmark.worker_coordinator.worker_coordinator INFO Choosing [unthrottled] for [create-index].
2024-09-09 16:08:20,385 ActorAddr-(T|:36659)/PID:761 osbenchmark.worker_coordinator.worker_coordinator INFO Creating iteration-count based schedule with [None] distribution for [create-index] with [0] warmup iterations and [1] iterations.
2024-09-09 16:08:20,385 ActorAddr-(T|:36659)/PID:761 osbenchmark.worker_coordinator.worker_coordinator INFO iteration-count-based schedule will determine when the schedule for [create-index] terminates.
2024-09-09 16:08:20,397 ActorAddr-(T|:34845)/PID:710 osbenchmark.actor INFO BenchmarkActor received unknown message [ActorExitRequest] (ignoring).
2024-09-09 16:08:20,417 ActorAddr-(T|:37323)/PID:730 osbenchmark.actor INFO Main worker_coordinator received ActorExitRequest and will terminate all load generators.
2024-09-09 16:08:20,415 ActorAddr-(T|:34845)/PID:710 osbenchmark.actor INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:43115)] (ignoring).
2024-09-09 16:08:20,418 ActorAddr-(T|:34845)/PID:710 osbenchmark.actor INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:37323)] (ignoring).
2024-09-09 16:08:23,399 -not-actor-/PID:701 osbenchmark.benchmark INFO Attempting to shutdown internal actor system.
2024-09-09 16:08:23,400 -not-actor-/PID:709 root INFO ActorSystem Logging Shutdown
2024-09-09 16:08:23,421 -not-actor-/PID:708 root INFO ---- Actor System shutdown
2024-09-09 16:08:23,421 -not-actor-/PID:701 osbenchmark.benchmark INFO Actor system is still running. Waiting...
2024-09-09 16:08:24,421 -not-actor-/PID:701 osbenchmark.benchmark INFO Shutdown completed.
2024-09-09 16:08:24,422 -not-actor-/PID:701 osbenchmark.benchmark ERROR Cannot run subcommand [execute-test].
Traceback (most recent call last):
File "/.venv/lib64/python3.11/site-packages/osbenchmark/benchmark.py", line 931, in dispatch_sub_command
execute_test(cfg, args.kill_running_processes)
File "/.venv/lib64/python3.11/site-packages/osbenchmark/benchmark.py", line 690, in execute_test
with_actor_system(test_execution_orchestrator.run, cfg)
File "/.venv/lib64/python3.11/site-packages/osbenchmark/benchmark.py", line 717, in with_actor_system
runnable(cfg)
File "/.venv/lib64/python3.11/site-packages/osbenchmark/test_execution_orchestrator.py", line 381, in run
raise e
File "/.venv/lib64/python3.11/site-packages/osbenchmark/test_execution_orchestrator.py", line 378, in run
pipeline(cfg)
File "/.venv/lib64/python3.11/site-packages/osbenchmark/test_execution_orchestrator.py", line 69, in __call__
self.target(cfg)
File "/.venv/lib64/python3.11/site-packages/osbenchmark/test_execution_orchestrator.py", line 314, in benchmark_only
return execute_test(cfg, external=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.venv/lib64/python3.11/site-packages/osbenchmark/test_execution_orchestrator.py", line 273, in execute_test
raise exceptions.BenchmarkError(result.message, result.cause)
osbenchmark.exceptions.BenchmarkError: Worker [0] has exited prematurely.
Activity