Each submission config contains accuracy config and a single performance config. When sweeping through concurrencies, requiring an accuracy run for each concurrency value increases runtime, especially when the accuracy dataset takes similar/longer time compared to the performance dataset. Propose to allow multiple performance configs within a submission config.
Before:
load_pattern:
type: "concurrency"
target_concurrency: 512
After:
load_pattern:
type: "concurrency"
target_concurrency: 64,128,256,512
This allows amortization of doing accuracy runs and encourages using a single endpoint config for different concurrency levels.
Several possible considerations:
- target concurrency values are strictly ascending
- Requests for a new concurrency are issued after requests for the current concurrency value are all received (not issued).
- Accuracy run uses the highest concurrency
Each submission config contains accuracy config and a single performance config. When sweeping through concurrencies, requiring an accuracy run for each concurrency value increases runtime, especially when the accuracy dataset takes similar/longer time compared to the performance dataset. Propose to allow multiple performance configs within a submission config.
Before:
After:
This allows amortization of doing accuracy runs and encourages using a single endpoint config for different concurrency levels.
Several possible considerations: