Benchmark : Update benchmark configs for Nightly #126

varun-sundar-rabindranath · 2024-03-14T20:03:11Z

SUMMARY:
Update the benchmark configs such that the Nightly runs the following models,

7b Mistral
- Base : teknium/OpenHermes-2.5-Mistral-7B
- GPTQ : TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ
- Marlin : neuralmagic/OpenHermes-2.5-Mistral-7B-marlin
- Sparse : neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50
- Sparse 2:4 : neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4
Llama 7b fp16
- NousResearch/Llama-2-7b-chat-hf #fp16
Update benchmark_serving num_prompts and qps pairs.
Minor update to the benchmark_throughput prefill and decode cases.

TEST PLAN:
Manual testing

varun-sundar-rabindranath · 2024-03-15T13:35:15Z

The new config takes 6.5 hrs on a A10g x 4 machine - 4.5 hrs for the serving case and 2 hrs for the throughput case.
Extrapolating on previous benchmark runs, It will likely take around 10 hrs on a A10g x 1 machine.

Also note that this is with "benchmark iterations" set to 1.

We can split the runs into 2 to speed up the process, But I am not sure about the implications on cost. @robertgshaw2-neuralmagic @mgoin @dhuangnm any suggestions ?

robertgshaw2-neuralmagic · 2024-03-15T15:02:54Z

Do you know why it takes so long for the serving case?

Looks like from the config we have 5 models and 5 scenarios, each scenario takes 5 minutes --> 555 = 125 / 60 = 2hr?

I think we should adjust to do ~2.5minutes per scenario

varun-sundar-rabindranath · 2024-03-15T20:33:15Z

@robertgshaw2-neuralmagic we have 6 models and 5 scenarios - so a total of 6 * 5 * 5 = 2.5 hrs, which is still considerably smaller than what we are seeing. some reasons i could think of,

we generate the dataset from scratch for each run. We should cache it.
model loading time
engine warmup time (after creating the engine, we run 1000 prompts at infinity qps)
I can look at this as a follow up.

varun-sundar-rabindranath requested review from mgoin and robertgshaw2-neuralmagic March 14, 2024 20:03

varun-sundar-rabindranath force-pushed the varun/update-nightly-configs branch from 52f22be to dbae522 Compare March 14, 2024 21:55

varun-sundar-rabindranath requested a review from dhuangnm March 15, 2024 13:35

robertgshaw2-neuralmagic approved these changes Mar 15, 2024

View reviewed changes

Varun Sundar Rabindranath added 6 commits March 15, 2024 19:22

update model profiles for benchmark

1c2d205

update nightly model profiles

f079aef

rebase fixes

268b729

format jsons

b451200

add nm-magic-wand dep

b8e99ce

increase nightly benchmark timeouts

feb2f9d

varun-sundar-rabindranath force-pushed the varun/update-nightly-configs branch from 7e8b0bb to feb2f9d Compare March 15, 2024 19:24

varun-sundar-rabindranath merged commit ac8f242 into main Mar 15, 2024
2 checks passed

varun-sundar-rabindranath deleted the varun/update-nightly-configs branch March 15, 2024 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark : Update benchmark configs for Nightly #126

Benchmark : Update benchmark configs for Nightly #126

varun-sundar-rabindranath commented Mar 14, 2024 •

edited

Loading

varun-sundar-rabindranath commented Mar 15, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Mar 15, 2024 •

edited

Loading

varun-sundar-rabindranath commented Mar 15, 2024

Benchmark : Update benchmark configs for Nightly #126

Benchmark : Update benchmark configs for Nightly #126

Conversation

varun-sundar-rabindranath commented Mar 14, 2024 • edited Loading

varun-sundar-rabindranath commented Mar 15, 2024 • edited Loading

robertgshaw2-neuralmagic commented Mar 15, 2024 • edited Loading

varun-sundar-rabindranath commented Mar 15, 2024

varun-sundar-rabindranath commented Mar 14, 2024 •

edited

Loading

varun-sundar-rabindranath commented Mar 15, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Mar 15, 2024 •

edited

Loading