-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #421 from datamol-io/configs-v2
Configs v2
- Loading branch information
Showing
11 changed files
with
443 additions
and
288 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
# Configuring Graphium with Hydra | ||
This document provides users with a point of entry to composing configs in Graphium. As a flexible library with many features, configuration is an important part of Graphium. To make configurations as reusable as possible while providing maximum flexibility, we integrated Graphium with `hydra`. Our config structure is designed to make the following functionality as accessible as possible: | ||
|
||
- Switching between **accelerators** (CPU, GPU and IPU) | ||
- **Benchmarking** different models on the same dataset | ||
- **Fine-tuning** a pre-trained model on a new dataset | ||
|
||
In what follows, we describe how each of the above functionality is achieved and how users can benefit from this design to achieve the most with Graphium with as little configuration as possible. | ||
|
||
## Accelerators | ||
With Graphium supporting CPU, GPU and IPU hardware, easily switching between these accelerators is pre-configured. General, accelerator-specific configs are specified under `accelerator/`, whereas experiment-specific differences between the accelerators are specialized under `training/accelerator`. | ||
|
||
## Benchmarking | ||
Benchmarking multiple models on the same datasets and tasks requires us to easily switch between model configurations without redefining major parts of the architecture, task heads, featurization, metrics, predictor, etc. For example, when changing from a GCN to a GIN model, a simple switch of `architecture.gnn.layer_type: 'pyg:gin'` might suffice. Hence, we abstract the `model` configs under `model/` where such model configurations can be specified. | ||
In addition, switching models may have implications on configs specific to your current experiment, such as the name of the run or the directory to which model checkpoints are written. To enable such overrides, we can utilize `hydra` [specializations](https://hydra.cc/docs/patterns/specializing_config/). For example, for our ToyMix dataset, we specify the layer type under `model/[model_name].yaml`, e.g., for the GCN layer, | ||
|
||
```yaml | ||
# @package _global_ | ||
|
||
architecture: | ||
gnn: | ||
layer_type: 'pyg:gcn' | ||
``` | ||
and set experiment-related parameters in `training/model/toymix_[model_name].yaml` as a specialization, e.g., for the GIN layer, | ||
|
||
```yaml | ||
# @package _global_ | ||
constants: | ||
name: neurips2023_small_data_gin | ||
... | ||
trainer: | ||
model_checkpoint: | ||
dirpath: models_checkpoints/neurips2023-small-gin/ | ||
``` | ||
We can now utilize `hydra` to e.g., run a sweep over our models on the ToyMix dataset via | ||
|
||
```bash | ||
python main_run_multitask.py -m model=gcn,gin | ||
``` | ||
where the ToyMix dataset is pre-configured in `main.yaml`. Read on to find out how to define new datasets and architectures for pre-training and fine-tuning. | ||
|
||
## Pre-training / Fine-tuning | ||
From a configuration point-of-view, fine-tuning requires us to load a pre-trained model and attach new task heads. However, in a highly configurable library such as ours changing the task heads also requires changes to the logged metrics, loss functions and the source of the fine-tuning data. To allow a quick switch between pre-training and fine-tuning, by default, we configure models and the corresponding tasks in a separate manner. More specifically, | ||
|
||
- under `architecture/` we store architecture related configurations such as the definition of the GNN/Transformer layers or positional/structural encoders | ||
- under `tasks/` we store configurations specific to one task set, such as the multi-task dataset ToyMix | ||
- under `training/` we store configurations specific to training models which could be different for each combination of `architecture` and `tasks` | ||
|
||
Since architecture and tasks are logically separated it now becomes very easy to e.g., use an existing architecture backbone on a new set of tasks or a new dataset altogether. Additionally, separating training allows us to specify different training parameters for e.g., pre-training and fine-tuning of the same architecture and task set. | ||
We will now detail how you can add new architectures, tasks and training configurations. | ||
|
||
### Adding an architecture | ||
The architecture config consists of specifications of the neural network components, including encoders, under the config key `architecture` and the featurization, containing the positional/structural information that is to be extracted from the data. | ||
To add a new architecture, create a file `architecture/my_architecture.yaml` with the following information specified: | ||
```yaml | ||
# @package _global_ | ||
architecture: | ||
model_type: FullGraphMultiTaskNetwork # for example | ||
pre_nn: | ||
... | ||
pre_nn_edges: | ||
... | ||
pe_encoders: | ||
encoders: # your encoders | ||
... | ||
gnn: # your GNN definition | ||
... | ||
graph_output_nn: # output NNs for different levels such as graph, node, etc. | ||
graph: | ||
... | ||
node: | ||
... | ||
... | ||
datamodule: | ||
module_type: "MultitaskFromSmilesDataModule" | ||
args: # Make sure to not specify anything task-specific here | ||
... | ||
featurization: | ||
... | ||
``` | ||
You can then select your new architecture during training, e.g., by running | ||
```bash | ||
python main_run_multitask.py architecture=my_architecture | ||
``` | ||
|
||
### Adding tasks | ||
The task set config consists of specifications for the task head neural nets under the config key `architecture.task_heads`; if required, any task-specific arguments to the datamodule you use, e.g., `datamodule.args.task_specfic_args` when using the `MultitaskFromSmilesDataModule` datamodule; the per-task metrics under the config key `metrics.[task]` where `[task]` matches the tasks specified under `architecture.task_heads`; the per-task configs of the `predictor` module, as well as the loss functions of the task set under the config key `predictor.loss_fun`. | ||
To add a new task set, create a file `tasks/my_tasks.yaml` with the following information specified: | ||
```yaml | ||
# @package _global_ | ||
architecture: | ||
task_heads: | ||
task1: | ||
... | ||
task2: | ||
... | ||
datamodule: # optional, depends on your concrete datamodule class. Here: "MultitaskFromSmilesDataModule" | ||
args: | ||
task_specific_args: | ||
task1: | ||
... | ||
task2: | ||
... | ||
metrics: | ||
task1: | ||
... | ||
task2: | ||
... | ||
predictor: | ||
metrics_on_progress_bar: | ||
task1: | ||
task2: | ||
loss_fun: ... # your loss functions for the multi-tasking | ||
``` | ||
You can then select your new dataset during training, e.g., by running | ||
```bash | ||
python main_run_multitask.py tasks=my_tasks | ||
``` | ||
|
||
### Adding training configs | ||
The training configs consist of specifications to the `predictor` and `trainer` modules. | ||
To add new training configs, create a file `training/my_training.yaml` with the following information specified: | ||
```yaml | ||
# @package _global_ | ||
predictor: | ||
optim_kwargs: | ||
lr: 4.e-5 | ||
torch_scheduler_kwargs: # example | ||
module_type: WarmUpLinearLR | ||
max_num_epochs: &max_epochs 100 | ||
warmup_epochs: 10 | ||
verbose: False | ||
scheduler_kwargs: | ||
... | ||
trainer: | ||
... | ||
trainer: # example | ||
precision: 16 | ||
max_epochs: *max_epochs | ||
min_epochs: 1 | ||
check_val_every_n_epoch: 20 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# @package _global_ | ||
|
||
architecture: | ||
model_type: FullGraphMultiTaskNetwork | ||
mup_base_path: null | ||
pre_nn: | ||
out_dim: 64 | ||
hidden_dims: 256 | ||
depth: 2 | ||
activation: relu | ||
last_activation: none | ||
dropout: 0.18 | ||
normalization: layer_norm | ||
last_normalization: ${architecture.pre_nn.normalization} | ||
residual_type: none | ||
|
||
pre_nn_edges: null | ||
|
||
pe_encoders: | ||
out_dim: 32 | ||
pool: "sum" #"mean" "max" | ||
last_norm: None #"batch_norm", "layer_norm" | ||
encoders: #la_pos | rw_pos | ||
la_pos: # Set as null to avoid a pre-nn network | ||
encoder_type: "laplacian_pe" | ||
input_keys: ["laplacian_eigvec", "laplacian_eigval"] | ||
output_keys: ["feat"] | ||
hidden_dim: 64 | ||
out_dim: 32 | ||
model_type: 'DeepSet' #'Transformer' or 'DeepSet' | ||
num_layers: 2 | ||
num_layers_post: 1 # Num. layers to apply after pooling | ||
dropout: 0.1 | ||
first_normalization: "none" #"batch_norm" or "layer_norm" | ||
rw_pos: | ||
encoder_type: "mlp" | ||
input_keys: ["rw_return_probs"] | ||
output_keys: ["feat"] | ||
hidden_dim: 64 | ||
out_dim: 32 | ||
num_layers: 2 | ||
dropout: 0.1 | ||
normalization: "layer_norm" #"batch_norm" or "layer_norm" | ||
first_normalization: "layer_norm" #"batch_norm" or "layer_norm" | ||
|
||
gnn: # Set as null to avoid a post-nn network | ||
in_dim: 64 # or otherwise the correct value | ||
out_dim: &gnn_dim 96 | ||
hidden_dims: *gnn_dim | ||
depth: 4 | ||
activation: gelu | ||
last_activation: none | ||
dropout: 0.1 | ||
normalization: "layer_norm" | ||
last_normalization: ${architecture.pre_nn.normalization} | ||
residual_type: simple | ||
virtual_node: 'none' | ||
layer_type: 'pyg:gcn' #pyg:gine #'pyg:gps' # pyg:gated-gcn, pyg:gine,pyg:gps | ||
layer_kwargs: null # Parameters for the model itself. You could define dropout_attn: 0.1 | ||
|
||
graph_output_nn: | ||
graph: | ||
pooling: [sum] | ||
out_dim: *gnn_dim | ||
hidden_dims: *gnn_dim | ||
depth: 1 | ||
activation: relu | ||
last_activation: none | ||
dropout: ${architecture.pre_nn.dropout} | ||
normalization: ${architecture.pre_nn.normalization} | ||
last_normalization: "none" | ||
residual_type: none | ||
|
||
datamodule: | ||
module_type: "MultitaskFromSmilesDataModule" | ||
args: | ||
prepare_dict_or_graph: pyg:graph | ||
featurization_n_jobs: 30 | ||
featurization_progress: True | ||
featurization_backend: "loky" | ||
processed_graph_data_path: "../datacache/neurips2023-small/" | ||
num_workers: 30 # -1 to use all | ||
persistent_workers: False | ||
featurization: | ||
atom_property_list_onehot: [atomic-number, group, period, total-valence] | ||
atom_property_list_float: [degree, formal-charge, radical-electron, aromatic, in-ring] | ||
edge_property_list: [bond-type-onehot, stereo, in-ring] | ||
add_self_loop: False | ||
explicit_H: False # if H is included | ||
use_bonds_weights: False | ||
pos_encoding_as_features: | ||
pos_types: | ||
lap_eigvec: | ||
pos_level: node | ||
pos_type: laplacian_eigvec | ||
num_pos: 8 | ||
normalization: "none" # normalization already applied on the eigen vectors | ||
disconnected_comp: True # if eigen values/vector for disconnected graph are included | ||
lap_eigval: | ||
pos_level: node | ||
pos_type: laplacian_eigval | ||
num_pos: 8 | ||
normalization: "none" # normalization already applied on the eigen vectors | ||
disconnected_comp: True # if eigen values/vector for disconnected graph are included | ||
rw_pos: # use same name as pe_encoder | ||
pos_level: node | ||
pos_type: rw_return_probs | ||
ksteps: 16 |
Oops, something went wrong.