Skip to content

Handle no valid eval results for mt_bench #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2024

Conversation

danmcp
Copy link
Member

@danmcp danmcp commented Nov 14, 2024

If all results in the evaluation result in an error, eval would fail with an error like:

2024-11-14T05:10:31.3739474Z DEBUG 2024-11-14 05:10:31,366 instructlab.eval.mt_bench_judgment:84: {'question_file': '/tmp/tmp.Mb3su0dsTr/.local/share/instructlab/internal/eval_data/mt_bench_branch/rc/question.jsonl', 'judgment_file': '/tmp/tmp.Mb3su0dsTr/.local/share/instructlab/internal/eval_data/mt_bench_branch/rc/model_judgment/granite-7b-lab_single.jsonl', 'answer_file': '/tmp/tmp.Mb3su0dsTr/.local/share/instructlab/internal/eval_data/mt_bench_branch/rc/model_answer/samples_0.jsonl', 'bench_name': 'mt_bench_branch'}
2024-11-14T05:10:31.3741675Z DEBUG 2024-11-14 05:10:31,372 instructlab.eval.mt_bench_judgment:93: #judgments: 20
2024-11-14T05:10:31.3742653Z DEBUG 2024-11-14 05:10:31,372 instructlab.eval.mt_bench_judgment:94: #error free judgments: 0
2024-11-14T05:10:31.3743366Z DEBUG 2024-11-14 05:10:31,372 instructlab.eval.mt_bench_judgment:95: error rate: 1.0
2024-11-14T05:10:31.3744153Z DEBUG 2024-11-14 05:10:31,373 instructlab.model.backends.vllm:453: Sending SIGINT to vLLM server PID 18629
2024-11-14T05:10:31.3745050Z DEBUG 2024-11-14 05:10:31,373 instructlab.model.backends.vllm:457: Waiting for vLLM server to shut down gracefully
2024-11-14T05:10:37.3479458Z DEBUG 2024-11-14 05:10:37,347 instructlab.model.backends.vllm:472: Nothing left to clean up with the vLLM process group
2024-11-14T05:10:37.3480824Z INFO 2024-11-14 05:10:37,347 instructlab.model.backends.vllm:487: Waiting for GPU VRAM reclamation...
2024-11-14T05:10:38.3485932Z DEBUG 2024-11-14 05:10:38,348 instructlab.model.backends.vllm:554: GPU free vram stable (stable count 1, free 23368695808, last free 23368695808)
2024-11-14T05:10:39.3489564Z DEBUG 2024-11-14 05:10:39,348 instructlab.model.backends.vllm:554: GPU free vram stable (stable count 2, free 23368695808, last free 23368695808)
2024-11-14T05:10:40.3492875Z DEBUG 2024-11-14 05:10:40,348 instructlab.model.backends.vllm:554: GPU free vram stable (stable count 3, free 23368695808, last free 23368695808)
2024-11-14T05:10:41.3496107Z DEBUG 2024-11-14 05:10:41,349 instructlab.model.backends.vllm:554: GPU free vram stable (stable count 4, free 23368695808, last free 23368695808)
2024-11-14T05:10:42.3499391Z DEBUG 2024-11-14 05:10:42,349 instructlab.model.backends.vllm:554: GPU free vram stable (stable count 5, free 23368695808, last free 23368695808)
2024-11-14T05:10:43.3502654Z DEBUG 2024-11-14 05:10:43,349 instructlab.model.backends.vllm:554: GPU free vram stable (stable count 6, free 23368695808, last free 23368695808)
2024-11-14T05:10:43.3505628Z DEBUG 2024-11-14 05:10:43,350 instructlab.model.backends.vllm:561: Successful sample recorded, (stable count 6, free 23368695808, last free 23368695808)
2024-11-14T05:10:43.3507351Z Traceback (most recent call last):
2024-11-14T05:10:43.3516827Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/bin/ilab", line 8, in <module>
2024-11-14T05:10:43.3517669Z     sys.exit(ilab())
2024-11-14T05:10:43.3517990Z              ^^^^^^
2024-11-14T05:10:43.3518745Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1157, in __call__
2024-11-14T05:10:43.3519413Z     return self.main(*args, **kwargs)
2024-11-14T05:10:43.3519705Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3520397Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1078, in main
2024-11-14T05:10:43.3521014Z     rv = self.invoke(ctx)
2024-11-14T05:10:43.3521253Z          ^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3521909Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke
2024-11-14T05:10:43.3522780Z     return _process_result(sub_ctx.command.invoke(sub_ctx))
2024-11-14T05:10:43.3523175Z                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3523931Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke
2024-11-14T05:10:43.3524631Z     return _process_result(sub_ctx.command.invoke(sub_ctx))
2024-11-14T05:10:43.3525185Z                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3525940Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1434, in invoke
2024-11-14T05:10:43.3526622Z     return ctx.invoke(self.callback, **ctx.params)
2024-11-14T05:10:43.3526963Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3527680Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 783, in invoke
2024-11-14T05:10:43.3528336Z     return __callback(*args, **kwargs)
2024-11-14T05:10:43.3528724Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3529461Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func
2024-11-14T05:10:43.3530172Z     return f(get_current_context(), *args, **kwargs)
2024-11-14T05:10:43.3530528Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3531316Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/instructlab/clickext.py", line 323, in wrapper
2024-11-14T05:10:43.3532007Z     return f(*args, **kwargs)
2024-11-14T05:10:43.3532445Z            ^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3533229Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/instructlab/model/evaluate.py", line 721, in evaluate
2024-11-14T05:10:43.3534036Z     overall_score, qa_pairs, error_rate = evaluator.judge_answers(
2024-11-14T05:10:43.3534545Z                                           ^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3535457Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/instructlab/eval/mt_bench.py", line 271, in judge_answers
2024-11-14T05:10:43.3536334Z     overall_score, qa_pairs, _, error_rate = mt_bench_judgment.generate_judgment(
2024-11-14T05:10:43.3536806Z                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3537716Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/instructlab/eval/mt_bench_judgment.py", line 314, in generate_judgment
2024-11-14T05:10:43.3538473Z     return make_judgment(
2024-11-14T05:10:43.3538713Z            ^^^^^^^^^^^^^^
2024-11-14T05:10:43.3539504Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/instructlab/eval/mt_bench_judgment.py", line 100, in make_judgment
2024-11-14T05:10:43.3540267Z     overall_score = df_1["score"].iloc[0]
2024-11-14T05:10:43.3540567Z                     ~~~~~~~~~~~~~~~~~~^^^
2024-11-14T05:10:43.3541348Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/pandas/core/indexing.py", line 1191, in __getitem__
2024-11-14T05:10:43.3542115Z     return self._getitem_axis(maybe_callable, axis=axis)
2024-11-14T05:10:43.3542648Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-14T05:10:43.3543465Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/pandas/core/indexing.py", line 1752, in _getitem_axis
2024-11-14T05:10:43.3544172Z     self._validate_integer(key, axis)
2024-11-14T05:10:43.3544997Z   File "/actions-runner/_work/sdg/sdg/instructlab/venv/lib64/python3.11/site-packages/pandas/core/indexing.py", line 1685, in _validate_integer
2024-11-14T05:10:43.3545861Z     raise IndexError("single positional indexer is out-of-bounds")
2024-11-14T05:10:43.3546363Z IndexError: single positional indexer is out-of-bounds

This change will result in:

INFO 2024-11-14 20:24:19,282 instructlab.model.backends.vllm:136: Waiting for the vLLM server to start at http://127.0.0.1:56677/v1, this might take a moment... Attempt: 8/120
INFO 2024-11-14 20:24:23,829 instructlab.model.backends.vllm:136: Waiting for the vLLM server to start at http://127.0.0.1:56677/v1, this might take a moment... Attempt: 9/120
INFO 2024-11-14 20:24:26,485 instructlab.model.backends.vllm:143: vLLM engine successfully started at http://127.0.0.1:56677/v1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.91s/it]
INFO 2024-11-14 20:24:40,239 instructlab.model.backends.vllm:487: Waiting for GPU VRAM reclamation...
Evaluation provided no result. See logs for more details.

Signed-off-by: Dan McPherson <dmcphers@redhat.com>
@bbrowning
Copy link

The new error is much clearer to read, as someone who recently hit this error. Thank you!

Copy link
Member

@nathan-weinberg nathan-weinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danmcp!

@mergify mergify bot added the one-approval label Nov 14, 2024
@mergify mergify bot removed the one-approval label Nov 14, 2024
@mergify mergify bot merged commit 4bde0b3 into instructlab:main Nov 14, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants