Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Configs #10

Closed
mivanit opened this issue Feb 7, 2023 · 4 comments
Closed

Simplify Configs #10

mivanit opened this issue Feb 7, 2023 · 4 comments
Assignees

Comments

@mivanit
Copy link
Member

mivanit commented Feb 7, 2023

tl;dr: model/training/dataset configuration is a mess and needs to be reworked

Currently, we have 3 different configurations:

  • Model configurations, which deal with architecture of the transformer nets and create configs which inherit from hugginface model configs
  • Training configurations, which handle training hyperparams
  • Dataset config, which tells us how the dataset was generated and what size mazes it contains

the problem we've run into so far is that its easy to forget to adjust the vocab size in the model config, which should be inherited somehow from the dataset config. Moving forward, we also want to consider that we want to account for:

  • non-lattice mazes (arbitrary graphs)
  • higher-dimensional mazes
  • different tokenization methods
    • this includes tokenizing the x,y coord of a cell in the maze separately, rather than having a unique token for each cell
  • other maze properties, such as target/negative reward regions, labelling the maze cells with additional data, etc
  • finetuning models on a different dataset from the original dataset

my naive instinct is that the structure of configs should be something like the following:

@dataclass
class ModelConfig:
  train_config: TrainConfig|None
  dataset_config: DatasetConfig|None
  # (...) other properties of the architecture

@dataclass
class BaseModel:
  model_config: ModelConfig
  training_log: list|None # somewhere to log arbitrary data about the training process in a semi-human readable format

things like vocab size could be properties made to inherit from the DatasetConfig, and thus raise exceptions if the required config is not provided, rather than silently give the wrong values

@mivanit
Copy link
Member Author

mivanit commented Feb 13, 2023

working on this issue in dev-config-refactor branch. settled on TopLevelConfig which contains model, training, and dataset configs. @rusheb will be writing some tests, and the next thing we need to do is get the new migrated configs to work with HookedTransformer and get the training to work

@rusheb rusheb changed the title configuration architecture Simplify Configs Feb 16, 2023
@rusheb
Copy link
Collaborator

rusheb commented Feb 16, 2023

I'd like to merge the dev branch as soon as possible to unblock other issues.

I'll use this comment to keep track of tasks which we should do, but which can be deferred until after merging.

We should review and create an issue for each of these upon closing this ticket.

  • TrainConfig and BaseGPTConfig should throw an error when deserializing config with missing values
  • BaseGPTConfig should not serialize computed values from HookedTransformerConfig
  • Implement serialization and loading for TopLevelConfig
  • Use Pathlib everywhere
  • separate arguments to train_model for model_cfg and training_cfg
  • Handle serialization/loading of ConfigHolder with missing values
  • Make eval_model.py run
  • Make plot_attention.py run
  • Unit test MazeDatasetConfig
  • delete setup_train.py
  • delete gpt_config_args method
  • tidy up TRAIN_SAVE_FILES
  • create onboarding documents

@rusheb
Copy link
Collaborator

rusheb commented Feb 18, 2023

I have updated the branch to enable train_model to run with the updated configs design and HookedTransformer.

notes:

  • HookedTransformerConfig no longer inherits from BaseGptConfig. This allows us to serialize BaseGptConfig without including the default/computed properties HookedTransformerConfig
  • I added a function to the ConfigHolder (previously called TopLevelConfig) to instantiate the HookedTransformer by explicitly passing it the required arguments. Not sure whether this is sustainable.
  • I haven't included the data_cfg.gpt_config_kwargs (i.e. pad_token_id, bos_token_id, eos_token_id). Do we need these? We may have to define a tokenizer to pass to HookedTransformer
  • I added an integration test which generates a dataset in a temp folder and runs train_model. I'm not sure how useful this is. Maybe we should delete it. I am also not sure how to see whether the optimization is working - e.g. should we see the loss going down or something? Maybe we can add an assertion on this to the test.

@rusheb
Copy link
Collaborator

rusheb commented Feb 20, 2023

Addressed by #20

@rusheb rusheb closed this as completed Feb 20, 2023
This was referenced Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants