Simplify Configs #10

mivanit · 2023-02-07T03:28:17Z

tl;dr: model/training/dataset configuration is a mess and needs to be reworked

Currently, we have 3 different configurations:

Model configurations, which deal with architecture of the transformer nets and create configs which inherit from hugginface model configs
Training configurations, which handle training hyperparams
Dataset config, which tells us how the dataset was generated and what size mazes it contains

the problem we've run into so far is that its easy to forget to adjust the vocab size in the model config, which should be inherited somehow from the dataset config. Moving forward, we also want to consider that we want to account for:

non-lattice mazes (arbitrary graphs)
higher-dimensional mazes
different tokenization methods
- this includes tokenizing the x,y coord of a cell in the maze separately, rather than having a unique token for each cell
other maze properties, such as target/negative reward regions, labelling the maze cells with additional data, etc
finetuning models on a different dataset from the original dataset

my naive instinct is that the structure of configs should be something like the following:

@dataclass
class ModelConfig:
  train_config: TrainConfig|None
  dataset_config: DatasetConfig|None
  # (...) other properties of the architecture

@dataclass
class BaseModel:
  model_config: ModelConfig
  training_log: list|None # somewhere to log arbitrary data about the training process in a semi-human readable format

things like vocab size could be properties made to inherit from the DatasetConfig, and thus raise exceptions if the required config is not provided, rather than silently give the wrong values

The text was updated successfully, but these errors were encountered:

mivanit · 2023-02-13T19:23:27Z

working on this issue in dev-config-refactor branch. settled on TopLevelConfig which contains model, training, and dataset configs. @rusheb will be writing some tests, and the next thing we need to do is get the new migrated configs to work with HookedTransformer and get the training to work

rusheb · 2023-02-16T20:26:47Z

rusheb · 2023-02-18T16:34:16Z

I have updated the branch to enable train_model to run with the updated configs design and HookedTransformer.

notes:

HookedTransformerConfig no longer inherits from BaseGptConfig. This allows us to serialize BaseGptConfig without including the default/computed properties HookedTransformerConfig
I added a function to the ConfigHolder (previously called TopLevelConfig) to instantiate the HookedTransformer by explicitly passing it the required arguments. Not sure whether this is sustainable.
I haven't included the data_cfg.gpt_config_kwargs (i.e. pad_token_id, bos_token_id, eos_token_id). Do we need these? We may have to define a tokenizer to pass to HookedTransformer
I added an integration test which generates a dataset in a temp folder and runs train_model. I'm not sure how useful this is. Maybe we should delete it. I am also not sure how to see whether the optimization is working - e.g. should we see the loss going down or something? Maybe we can add an assertion on this to the test.

rusheb · 2023-02-20T22:03:19Z

Addressed by #20

rusheb changed the title ~~configuration architecture~~ Simplify Configs Feb 16, 2023

This was referenced Feb 16, 2023

Move to Poetry & Dev Containers #15

Closed

Run project in cloud environment #9

Closed

Lint entire project and add PR gates #17

Closed

Get mypy passing and add PR gate #18

Closed

rusheb closed this as completed Feb 20, 2023

This was referenced Feb 20, 2023

Make eval_model.py run #26

Closed

Make plot_attention.py run #27

Closed

rusheb assigned mivanit Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify Configs #10

Simplify Configs #10

mivanit commented Feb 7, 2023 •

edited by rusheb

Loading

mivanit commented Feb 13, 2023

rusheb commented Feb 16, 2023 •

edited

Loading

rusheb commented Feb 18, 2023

rusheb commented Feb 20, 2023

Simplify Configs #10

Simplify Configs #10

Comments

mivanit commented Feb 7, 2023 • edited by rusheb Loading

mivanit commented Feb 13, 2023

rusheb commented Feb 16, 2023 • edited Loading

rusheb commented Feb 18, 2023

rusheb commented Feb 20, 2023

mivanit commented Feb 7, 2023 •

edited by rusheb

Loading

rusheb commented Feb 16, 2023 •

edited

Loading