HellaSwag: speed up by parallelizing log-prob evaluation #5020
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After PR #5017 that has significantly improved the performance of HellaSwag evaluation via batching, the fraction of time spent in evaluating token log-probabilities in single threaded mode has become significant.
With this PR, this part of the calculation is parallelized.
For Mistral-7B and
fp16
, time on my system (32-core Ryze-5975WX + RTX-4080) goes down from 536 seconds after PR #5017 to 423 seconds for the full evaluation dataset (10042 tasks).For reference, evaluation time before #5017 was 1285 seconds.