Create an eval-only script for existing ckpts #736

liujch1998 · 2024-10-20T23:00:39Z

This PR adds scripts/eval.py, which evaluates one or more existing ckpts while bypassing the training steps.

It seems impossible to backfill evals back to the original wandb run, because "step" must always increase. Rewinding the run will truncate the log, which we don't want. Therefore, this script logs things to a new wandb run.

Starting from a training setup:

You can keep using the same yaml file.
Make a copy of the XXX.sh file into XXX-eval.sh, point to scripts/eval.sh, add a flag --wandb.group=XXX to ensure it logs to the same group, and specify --load_path to be either a single ckpt or all ckpts under a directory.
Make a copy of the XXX-launch.sh file into XXX-eval-launch.sh, change --task-name to XXX-eval, and change the command so it runs XXX-eval.sh.

See an example in peteish1-eval.sh and peteish1-eval-launch.sh.

configs/peteish1-weka.yaml

dirkgr · 2024-10-25T23:37:17Z

olmo/train.py

-        if wandb.run is not None:
-            wandb.finish(exit_code=exit_code, quiet=True)
+        # if wandb.run is not None:
+        #     wandb.finish(exit_code=exit_code, quiet=True)


Debug code?

will fix this

dirkgr · 2024-10-25T23:39:15Z

scripts/eval.py

+    # train_loader = build_train_dataloader(cfg)
+    train_loader = None


Is this always going to be None? If so, we don't need it.

dirkgr · 2024-10-25T23:39:53Z

scripts/eval.py

+    if 'step' in cfg.load_path.split('/')[-1]:
+        load_paths = [cfg.load_path]
+    else:
+        # This globbing does not work with remote paths.


How is that problem handled then?

Not handled. I will assume the checkpoints are on WEKA.

At least throw an exception then.

dirkgr · 2024-10-25T23:40:18Z

scripts/eval.py

+        log.info(f"Number of non-embedding parameters: {olmo_model.num_params(include_embedding=False):,d}")
+        log.info(f"Peak GPU Memory (MB) before {cfg.distributed_strategy}: {int(peak_gpu_memory() or 0)}")
+
+        olmo_model.set_activation_checkpointing(cfg.activation_checkpointing)


If we only ever eval, we don't need this.

dirkgr · 2024-10-25T23:41:18Z

scripts/eval.py

+        optim = build_optimizer(cfg, dist_model)
+        scheduler = build_scheduler(cfg)


We don't need optimizers and schedulers if we're just evaluating.

So you're creating these only so that you can produce a Trainer object?

How hard is it to pull the stuff you need out of the Trainer object, so we don't have to do so many things we don't need? It makes me particularly uncomfortable that you're creating a trainer with a None data loader, which isn't supposed to work. It just happens to work.

The Trainer class has too many precious helper functions and it's kinda dumb to unroll them all. I do wanna keep at least a dummy Trainer object. Let me see if I can create it w/o the optim/scheduler/etc stuff.

I think you might find that you don't need most of that stuff when you're doing inference only.

This reverts commit 99c0d80.

dirkgr · 2024-11-01T23:13:54Z

scripts/eval.py

+    if 'step' in cfg.load_path.split('/')[-1]:
+        load_paths = [cfg.load_path]
+    else:
+        # This globbing does not work with remote paths.


At least throw an exception then.

dirkgr · 2024-11-01T23:14:35Z

Let me know when this is ready for another review?

liujch1998 · 2024-11-08T21:26:45Z

scripts/eval.py

+    if cfg.load_path is None:
+        raise OLMoConfigurationError("To run eval you must provide a load_path")
+    elif "://" in cfg.load_path:
+        raise OLMoConfigurationError("Eval does not support remote paths. Please specify a local path or WEKA mounted path.")


throwing exception for remote paths here

liujch1998 and others added 30 commits October 15, 2024 21:46

Eval-only script

f879166

Fix env

7619ad7

Disable saving data indices

2b87757

Restore train dataloader

aeabd02

Do not load train state

414277b

Bypass trainer state

b27a822

Fix save folder

ea0cf07

Switch to loading sharded ckpt

746c674

Eval peteish1

cb54f80

Switch to 1 node

7b40310

Make things work for single node

7f994fe

Make things work for single node

9a5f076

Make things work for single node

d1e05fd

Make things work for single node

b455b99

Load train_dataloader

2f4d252

Change to another ckpt

331b0ad

Do not load train_dataloader and trainer_state

6acabf3

run for annealed model

2415719

Backfill does not seem possible; Evaluating multiple ckpts

788b397

Fix import

ed074da

Fix glob

eb628a8

Fix glob

ee6d55f

Fix load

901ed16

Switch back to the real peteish1

90e1f93

Fix ckpt loading

003cd29

Print sum of params

e67812c

Skip step0

c27037b

Print param sum of dist_model

1996d04

Print per-batch ce loss

d1c528d

Update

01b1dc4

Fix bug

a36b9e0

liujch1998 requested a review from dirkgr October 24, 2024 18:18

liujch1998 marked this pull request as ready for review October 24, 2024 18:18

liujch1998 and others added 4 commits October 24, 2024 18:25

Revert Peteish7 changes

97d78ed

Fix lint

9097624

Update CHANGELOG

752b9cc

Merge branch 'main' into backfill

ff41d5b

dirkgr requested changes Oct 25, 2024

View reviewed changes

liujch1998 and others added 6 commits October 28, 2024 19:56

Separate into TrainerForEval

6f60edd

Implement soft versions of accuracies

99c0d80

Revert "Implement soft versions of accuracies"

9ae7fcc

This reverts commit 99c0d80.

Eval peteish7

54248d1

Add back optim and scheduler

dfdd5dc

Set eval_batch_size=16

52347a0

dirkgr requested changes Nov 1, 2024

View reviewed changes

liujch1998 added 3 commits November 4, 2024 04:39

Temporary: resume from step712000

7826a92

Revert 712000

a4fd271

Support unsharded ckpt; Throw exception for remote load_path

90523d0

liujch1998 commented Nov 8, 2024

View reviewed changes

liujch1998 added 10 commits November 11, 2024 21:56

Evaluate HF models

6610f1b

Update

1fe0a36

Update

ae054cf

Update

4a4a5b1

Update

47a2907

Debug None in ctx

a44cd79

Debug None in ctx

2a6bed8

Debug None in ctx

a868d87

Debug None in ctx

85088d6

Debug None in ctx

780d97e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create an eval-only script for existing ckpts #736

Create an eval-only script for existing ckpts #736

liujch1998 commented Oct 20, 2024 •

edited

Loading

dirkgr Oct 25, 2024

liujch1998 Oct 28, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

liujch1998 Oct 28, 2024

dirkgr Nov 1, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

liujch1998 Oct 28, 2024

dirkgr Nov 1, 2024

dirkgr Nov 1, 2024

dirkgr commented Nov 1, 2024

liujch1998 Nov 8, 2024

		# train_loader = build_train_dataloader(cfg)
		train_loader = None

		optim = build_optimizer(cfg, dist_model)
		scheduler = build_scheduler(cfg)

Create an eval-only script for existing ckpts #736

Are you sure you want to change the base?

Create an eval-only script for existing ckpts #736

Conversation

liujch1998 commented Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr commented Nov 1, 2024

Choose a reason for hiding this comment

liujch1998 commented Oct 20, 2024 •

edited

Loading