Skip to content

Commit

Permalink
Merge pull request #421 from datamol-io/configs-v2
Browse files Browse the repository at this point in the history
Configs v2
  • Loading branch information
DomInvivo authored Aug 1, 2023
2 parents debe62f + 16c2770 commit 57399b3
Show file tree
Hide file tree
Showing 11 changed files with 443 additions and 288 deletions.
154 changes: 154 additions & 0 deletions expts/hydra-configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Configuring Graphium with Hydra
This document provides users with a point of entry to composing configs in Graphium. As a flexible library with many features, configuration is an important part of Graphium. To make configurations as reusable as possible while providing maximum flexibility, we integrated Graphium with `hydra`. Our config structure is designed to make the following functionality as accessible as possible:

- Switching between **accelerators** (CPU, GPU and IPU)
- **Benchmarking** different models on the same dataset
- **Fine-tuning** a pre-trained model on a new dataset

In what follows, we describe how each of the above functionality is achieved and how users can benefit from this design to achieve the most with Graphium with as little configuration as possible.

## Accelerators
With Graphium supporting CPU, GPU and IPU hardware, easily switching between these accelerators is pre-configured. General, accelerator-specific configs are specified under `accelerator/`, whereas experiment-specific differences between the accelerators are specialized under `training/accelerator`.

## Benchmarking
Benchmarking multiple models on the same datasets and tasks requires us to easily switch between model configurations without redefining major parts of the architecture, task heads, featurization, metrics, predictor, etc. For example, when changing from a GCN to a GIN model, a simple switch of `architecture.gnn.layer_type: 'pyg:gin'` might suffice. Hence, we abstract the `model` configs under `model/` where such model configurations can be specified.
In addition, switching models may have implications on configs specific to your current experiment, such as the name of the run or the directory to which model checkpoints are written. To enable such overrides, we can utilize `hydra` [specializations](https://hydra.cc/docs/patterns/specializing_config/). For example, for our ToyMix dataset, we specify the layer type under `model/[model_name].yaml`, e.g., for the GCN layer,

```yaml
# @package _global_

architecture:
gnn:
layer_type: 'pyg:gcn'
```
and set experiment-related parameters in `training/model/toymix_[model_name].yaml` as a specialization, e.g., for the GIN layer,

```yaml
# @package _global_
constants:
name: neurips2023_small_data_gin
...
trainer:
model_checkpoint:
dirpath: models_checkpoints/neurips2023-small-gin/
```
We can now utilize `hydra` to e.g., run a sweep over our models on the ToyMix dataset via

```bash
python main_run_multitask.py -m model=gcn,gin
```
where the ToyMix dataset is pre-configured in `main.yaml`. Read on to find out how to define new datasets and architectures for pre-training and fine-tuning.

## Pre-training / Fine-tuning
From a configuration point-of-view, fine-tuning requires us to load a pre-trained model and attach new task heads. However, in a highly configurable library such as ours changing the task heads also requires changes to the logged metrics, loss functions and the source of the fine-tuning data. To allow a quick switch between pre-training and fine-tuning, by default, we configure models and the corresponding tasks in a separate manner. More specifically,

- under `architecture/` we store architecture related configurations such as the definition of the GNN/Transformer layers or positional/structural encoders
- under `tasks/` we store configurations specific to one task set, such as the multi-task dataset ToyMix
- under `training/` we store configurations specific to training models which could be different for each combination of `architecture` and `tasks`

Since architecture and tasks are logically separated it now becomes very easy to e.g., use an existing architecture backbone on a new set of tasks or a new dataset altogether. Additionally, separating training allows us to specify different training parameters for e.g., pre-training and fine-tuning of the same architecture and task set.
We will now detail how you can add new architectures, tasks and training configurations.

### Adding an architecture
The architecture config consists of specifications of the neural network components, including encoders, under the config key `architecture` and the featurization, containing the positional/structural information that is to be extracted from the data.
To add a new architecture, create a file `architecture/my_architecture.yaml` with the following information specified:
```yaml
# @package _global_
architecture:
model_type: FullGraphMultiTaskNetwork # for example
pre_nn:
...
pre_nn_edges:
...
pe_encoders:
encoders: # your encoders
...
gnn: # your GNN definition
...
graph_output_nn: # output NNs for different levels such as graph, node, etc.
graph:
...
node:
...
...
datamodule:
module_type: "MultitaskFromSmilesDataModule"
args: # Make sure to not specify anything task-specific here
...
featurization:
...
```
You can then select your new architecture during training, e.g., by running
```bash
python main_run_multitask.py architecture=my_architecture
```

### Adding tasks
The task set config consists of specifications for the task head neural nets under the config key `architecture.task_heads`; if required, any task-specific arguments to the datamodule you use, e.g., `datamodule.args.task_specfic_args` when using the `MultitaskFromSmilesDataModule` datamodule; the per-task metrics under the config key `metrics.[task]` where `[task]` matches the tasks specified under `architecture.task_heads`; the per-task configs of the `predictor` module, as well as the loss functions of the task set under the config key `predictor.loss_fun`.
To add a new task set, create a file `tasks/my_tasks.yaml` with the following information specified:
```yaml
# @package _global_
architecture:
task_heads:
task1:
...
task2:
...
datamodule: # optional, depends on your concrete datamodule class. Here: "MultitaskFromSmilesDataModule"
args:
task_specific_args:
task1:
...
task2:
...
metrics:
task1:
...
task2:
...
predictor:
metrics_on_progress_bar:
task1:
task2:
loss_fun: ... # your loss functions for the multi-tasking
```
You can then select your new dataset during training, e.g., by running
```bash
python main_run_multitask.py tasks=my_tasks
```

### Adding training configs
The training configs consist of specifications to the `predictor` and `trainer` modules.
To add new training configs, create a file `training/my_training.yaml` with the following information specified:
```yaml
# @package _global_
predictor:
optim_kwargs:
lr: 4.e-5
torch_scheduler_kwargs: # example
module_type: WarmUpLinearLR
max_num_epochs: &max_epochs 100
warmup_epochs: 10
verbose: False
scheduler_kwargs:
...
trainer:
...
trainer: # example
precision: 16
max_epochs: *max_epochs
min_epochs: 1
check_val_every_n_epoch: 20
```
108 changes: 108 additions & 0 deletions expts/hydra-configs/architecture/toymix.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# @package _global_

architecture:
model_type: FullGraphMultiTaskNetwork
mup_base_path: null
pre_nn:
out_dim: 64
hidden_dims: 256
depth: 2
activation: relu
last_activation: none
dropout: 0.18
normalization: layer_norm
last_normalization: ${architecture.pre_nn.normalization}
residual_type: none

pre_nn_edges: null

pe_encoders:
out_dim: 32
pool: "sum" #"mean" "max"
last_norm: None #"batch_norm", "layer_norm"
encoders: #la_pos | rw_pos
la_pos: # Set as null to avoid a pre-nn network
encoder_type: "laplacian_pe"
input_keys: ["laplacian_eigvec", "laplacian_eigval"]
output_keys: ["feat"]
hidden_dim: 64
out_dim: 32
model_type: 'DeepSet' #'Transformer' or 'DeepSet'
num_layers: 2
num_layers_post: 1 # Num. layers to apply after pooling
dropout: 0.1
first_normalization: "none" #"batch_norm" or "layer_norm"
rw_pos:
encoder_type: "mlp"
input_keys: ["rw_return_probs"]
output_keys: ["feat"]
hidden_dim: 64
out_dim: 32
num_layers: 2
dropout: 0.1
normalization: "layer_norm" #"batch_norm" or "layer_norm"
first_normalization: "layer_norm" #"batch_norm" or "layer_norm"

gnn: # Set as null to avoid a post-nn network
in_dim: 64 # or otherwise the correct value
out_dim: &gnn_dim 96
hidden_dims: *gnn_dim
depth: 4
activation: gelu
last_activation: none
dropout: 0.1
normalization: "layer_norm"
last_normalization: ${architecture.pre_nn.normalization}
residual_type: simple
virtual_node: 'none'
layer_type: 'pyg:gcn' #pyg:gine #'pyg:gps' # pyg:gated-gcn, pyg:gine,pyg:gps
layer_kwargs: null # Parameters for the model itself. You could define dropout_attn: 0.1

graph_output_nn:
graph:
pooling: [sum]
out_dim: *gnn_dim
hidden_dims: *gnn_dim
depth: 1
activation: relu
last_activation: none
dropout: ${architecture.pre_nn.dropout}
normalization: ${architecture.pre_nn.normalization}
last_normalization: "none"
residual_type: none

datamodule:
module_type: "MultitaskFromSmilesDataModule"
args:
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: "../datacache/neurips2023-small/"
num_workers: 30 # -1 to use all
persistent_workers: False
featurization:
atom_property_list_onehot: [atomic-number, group, period, total-valence]
atom_property_list_float: [degree, formal-charge, radical-electron, aromatic, in-ring]
edge_property_list: [bond-type-onehot, stereo, in-ring]
add_self_loop: False
explicit_H: False # if H is included
use_bonds_weights: False
pos_encoding_as_features:
pos_types:
lap_eigvec:
pos_level: node
pos_type: laplacian_eigvec
num_pos: 8
normalization: "none" # normalization already applied on the eigen vectors
disconnected_comp: True # if eigen values/vector for disconnected graph are included
lap_eigval:
pos_level: node
pos_type: laplacian_eigval
num_pos: 8
normalization: "none" # normalization already applied on the eigen vectors
disconnected_comp: True # if eigen values/vector for disconnected graph are included
rw_pos: # use same name as pe_encoder
pos_level: node
pos_type: rw_return_probs
ksteps: 16
Loading

0 comments on commit 57399b3

Please sign in to comment.