Eval harness fix #457

sdtblck · 2021-11-07T15:11:50Z

This fixes some problems relating to eval harness & model parallel models.
To summarize:

logits weren't being gathered automatically, so with a model parallel size of 2, each machine only had vocab_size / 2 logits, which resulted in indexing errors.
There was no straightforward way to toggle whether the logits were gathered or not, now there is (model._set_parallel_output(value))
When zero optimization was being used in a config, it would break at inference time. Fixed this by overriding the zero config when setting up for inference (https://github.com/EleutherAI/gpt-neox/compare/eval_harness_fix?expand=1#diff-b6ee2c1db780b46787ab3a576020f9a12695f3f5cb38ba8badcd5501960e5d22R409)
There was no way to specify which tasks to run on the command line at runtime, you would have to specify them in the yaml config. Now you can do ./deepy.py evaluate.py -d configs 20B --eval_tasks lambada wikitext hellaswag
Fixes some minor spelling errors (adaptor vs. adapter)

sdtblck added 2 commits November 7, 2021 03:48

fix eval harness when mp > 1

582d101

fix automatic gathering of output + pass in eval tasks with cmd line

6b5b705

sdtblck requested a review from a team as a code owner November 7, 2021 15:11

sdtblck requested review from EricHallahan and ShivanshuPurohit November 7, 2021 15:11

Update gpt2_model.py

c1f1d6a

EricHallahan previously approved these changes Nov 7, 2021

View reviewed changes

Update eval_adapter.py

96bff83

sdtblck dismissed EricHallahan’s stale review via 96bff83 November 7, 2021 15:46

EricHallahan approved these changes Nov 7, 2021

View reviewed changes

sdtblck merged commit 74afd1b into main Nov 7, 2021

sdtblck deleted the eval_harness_fix branch November 7, 2021 16:01

Provide feedback