When running submission config (accuracy + performance), accuracy samples are issued right after performance samples, causing accuracy samples to be batched with the last few performance samples. This results in reduced performance using accuracy+performance config especially under high concurrency. Attaching config files used for testing and results. Note that the lack of tokens/s report and "failed" samples are separate issues.
Propose to issue accuracy samples after results from all performance samples have been received.
offline_llama3_1b_cnn_full.yaml
report_1b_full.txt
offline_llama3_1b_cnn_perf.yaml
report_1b_perf.txt
When running submission config (accuracy + performance), accuracy samples are issued right after performance samples, causing accuracy samples to be batched with the last few performance samples. This results in reduced performance using accuracy+performance config especially under high concurrency. Attaching config files used for testing and results. Note that the lack of tokens/s report and "failed" samples are separate issues.
Propose to issue accuracy samples after results from all performance samples have been received.
offline_llama3_1b_cnn_full.yaml
report_1b_full.txt
offline_llama3_1b_cnn_perf.yaml
report_1b_perf.txt