-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify multiple search clients for easier benchmarking #614
Specify multiple search clients for easier benchmarking #614
Conversation
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
maybe_match_task_is_part_of_throughput_testing = re.search(throughput_pattern, task) | ||
if maybe_match_task_is_part_of_throughput_testing: | ||
|
||
# assumption: all units are the same and only maximizing throughput over one operation (i.e. not both ingest and search). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting emphasis on this comment. Right now search clients is the main use case for vector search (users want to see search throughput from inference perspective). I think adding support for multiple clients in any operation can be deferred to a later PR. Any thoughts?
osbenchmark/workload/loader.py
Outdated
for client in op["clients_list"]: | ||
op["clients"] = client | ||
|
||
new_name = name + "_" + str(client) + "_clients" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on testing, this renames the entire procedure, not just the operation in the schedule. To see this run or look at the test_parse_clients_list
unit test below -- the task is renamed "default-test_procedure_1_clients",""default-test_procedure_2_clients", etc
instead of search--one-client_1_clients, search-one-client_2_clients, etc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the unittest below, it looks like this is updating only the operation names within the schedule of test-procedure default-test_procedure
since schedule[N].name
is referencing an operation and not a test-procedure.
If it is suppose to be altering the test-procedure name, there needs to be an assert in the unittest showing that test_procedure.name
is no longer default-test_procedure
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right.
It seems like there's a name that's associated with each element of the schedule (the task name?). This is not necessarily the same name as the operation of that element in the schedule but it is the name that's changed.
I used a debugger and the workload object's values are copied below. For instance: [name = ['default-test-procedure_1_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}]
.
[name = ['default-test-procedure_1_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [1], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['default-test-procedure_2_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [2], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['default-test-procedure_3_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [3], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['search-two-clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [2], completes_parent = [False], schedule = [None], params = [{'name': 'search-two-clients', 'operation': 'search', 'clients': 2}], nested = [False]]
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
osbenchmark/workload/loader.py
Outdated
for client in op["clients_list"]: | ||
op["clients"] = client | ||
|
||
new_name = name + "_" + str(client) + "_clients" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is dependent on how the workload was written, we should standardize on either using -
or _
and not both in an operation's and test procedure's name. Many users have encountered frustrating errors all because the name had a mix of -
and _
.
Suggestion: Before renaming, check if the operation name is using -
or _
.
- If it is using
-
, use"-" + str(client) + "-clients"
. - If it is using
_
, use"_" + str(client) + "_clients"
. - Else, if it's already using both, OSB can by default append
"_" + str(client) + "_clients"
to the end of the existing operation name (like what you are currently doing). I would also log a warning to the screen that the user's test_procedure has a mix of-
and_
, which might cause frustrating bugs in the future.
tests/workload/loader_test.py
Outdated
} | ||
], | ||
"test_procedure": { | ||
"name": "default-test_procedure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would standardize on using hyphens or underscores here
Why would the OSB schedule search tasks with 10 and 12 clients if the parameters only specified 1 and 5? |
osbenchmark/results_publisher.py
Outdated
# assumption: all units are the same and only maximizing throughput over one operation (i.e. not both ingest and search). | ||
# To maximize throughput over multiple operations, would need a list/dictionary of maximum throughputs. | ||
task_throughput = record["throughput"][Throughput.MEAN.value] | ||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to be under the __init__
method of SummaryResultsPublisher for when we would like to add logging elsewhere in the class.
osbenchmark/results_publisher.py
Outdated
metrics_table.extend(self._publish_error_rate(record, task)) | ||
self.add_warnings(warnings, record, task) | ||
maybe_match_task_is_part_of_throughput_testing = re.search(throughput_pattern, task) | ||
if maybe_match_task_is_part_of_throughput_testing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simplify this to is_task_part_of_throughput_testing
?
|
||
else: | ||
self.publish_operational_statistics(metrics_table=metrics_table, warnings=warnings, record=record, task=task) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would be nice if there was another comment above this if statement to stating that the following blurb is related to throughput testing / when specifying multiple clients within the operation
osbenchmark/results_publisher.py
Outdated
@@ -217,6 +251,10 @@ def _publish_recall(self, values, task): | |||
self._line("Mean recall@1", task, recall_1_mean, "", lambda v: "%.2f" % v) | |||
) | |||
|
|||
def _publish_best_client_settings(self, record, task): | |||
num_clients = re.search(r"_(\d+)_clients$", task).group(1) | |||
return self._join(self._line("Num clients that achieved maximium throughput", "", num_clients, "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Num clients" -> "Number of clients"
If this is being surfaced to users in the output, we should refrain from using shorthand.
@@ -145,16 +162,33 @@ def publish(self): | |||
|
|||
metrics_table.extend(self._publish_transform_stats(stats)) | |||
|
|||
max_throughput = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would also add a comment above this to specify that the following three variables are related to clients_list
parameter in test_procedures. Newcomers might not quickly understand that this is only relevant to throughput testing clients
osbenchmark/workload/loader.py
Outdated
if "parallel" in op: | ||
task = self.parse_parallel(op["parallel"], ops, name) | ||
if "clients_list" in op: | ||
self.logger.info("Clients list specified, running multiple search tasks with %s clients.", op["clients_list"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "Clients list specified, running task with multiple clients: %s" might be cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is confusing. Technically we're running multiple tasks, each with multiple clients (or only 1 client).
I changed it to self.logger.info("Clients list specified: %s. Running multiple search tasks, each scheduled with the corresponding number of clients from the list.", op["clients_list"])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@finnroblin LGTM Thanks for doing this!
Description
Finding the maximum search throughput for an OpenSearch cluster is an important benchmarking scenario in vector search. Currently the only way to figure out the maximum search throughput is to perform multiple benchmark runs with different search client settings (e.g. search_clients = 3, search_clients = 5, ...). Waiting for a run to conclude, changing the config, and rerunning OSB with the associated startup time is tedious. Ideally the maximum throughput could be found automatically.
This PR allows users to provide a
clients_list
in a operation which runs the operation with each client setting.For instance, a user might specify
"clients_list": [1, 5]
in their parameters.Then OSB will schedule search tasks with 1, 5, 10, and 12 clients. The final benchmark results will look something like the following:
Issues Resolved
Closes #613
Testing
Unit tested
loader.py
changes + verified with multiple OSB runs that result publisher output looks good.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.