-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add relative standard deviation to aggregated test execution metrics #681
Conversation
Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
osbenchmark/aggregator.py
Outdated
for metric in self.metrics: | ||
op_metric[metric] = aggregated_task_metrics[metric] | ||
if isinstance(aggregated_task_metrics[metric], dict): | ||
mean_values = [v['mean'] for v in task_metrics[metric]] | ||
rsd = self.calculate_rsd(mean_values) | ||
op_metric[metric]['mean_rsd'] = rsd | ||
else: | ||
rsd = self.calculate_rsd(task_metrics[metric]) | ||
op_metric[f"{metric}_rsd"] = rsd | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments will be helpful here -- the first clause is for the standard metrics, the second for derived ones like error rate and duration, etc.
The documentation should indicate the rationale as to why RSD is computed for the mean, instead of the median.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also speak to nate about these changes👍
osbenchmark/aggregator.py
Outdated
if iterations > 1: | ||
weighted_sum = sum(value * iterations for value in values) | ||
total_iterations = iterations * len(values) | ||
weighted_metrics[metric] = weighted_sum / total_iterations | ||
weighted_avg = weighted_sum / total_iterations | ||
else: | ||
weighted_metrics[metric] = sum(values) / len(values) | ||
weighted_avg = sum(values) / len(values) | ||
weighted_metrics[metric] = weighted_avg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the special case for iterations == 1
? Doesn't this reduce to the same computation? And if the iteration number is the same for every element in values, the special-casing might not be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special case is fixed, but I want to include support for custom iteration values in the future so I think that should stay for now.
Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
osbenchmark/aggregator.py
Outdated
return weighted_metrics | ||
|
||
def calculate_rsd(self, values): | ||
if not values: | ||
raise ValueError("Cannot calculate RSD for an empty list of values") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice for error messages to also include which metric OSB was unable to calculate RSD for. Otherwise, users will have to add logging statements to find out which can be a nuisance.
Can easily do this if we add metric name as a parameter and include that in the Value Error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I added this👍
Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
osbenchmark/aggregator.py
Outdated
@@ -214,9 +214,9 @@ def calculate_weighted_average(self, task_metrics: Dict[str, List[Any]], iterati | |||
|
|||
return weighted_metrics | |||
|
|||
def calculate_rsd(self, values): | |||
def calculate_rsd(self, values, metric_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We should also incldue the type hints for values
if we are providing one for metric_name
Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to change the 100_0
to max
in the results summary, either in this or a future check-in.
Should this be done for regular test executions as well?
|
Description
Adds relative standard deviation to aggregated test result metrics. Users can now see the spread of their OSB test results. (RSD is currently only calculated for the mean)
Example:
Issues Resolved
#662
Testing
make test
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.