datamol-io
diff --git a/‎expts/hydra-configs/README.md‎
Lines changed: 154 additions & 0 deletions b/‎expts/hydra-configs/README.md‎
Lines changed: 154 additions & 0 deletions
diff --git a/‎expts/hydra-configs/dataset/toymix.yaml‎ renamed to ‎expts/hydra-configs/architecture/toymix.yaml‎
Lines changed: 7 additions & 126 deletions b/‎expts/hydra-configs/dataset/toymix.yaml‎ renamed to ‎expts/hydra-configs/architecture/toymix.yaml‎
Lines changed: 7 additions & 126 deletions
diff --git a/‎expts/hydra-configs/main.yaml‎
Lines changed: 11 additions & 3 deletions b/‎expts/hydra-configs/main.yaml‎
Lines changed: 11 additions & 3 deletions
@@ -0,0 +1,154 @@
+# Configuring Graphium with Hydra
+This document provides users with a point of entry to composing configs in Graphium. As a flexible library with many features, configuration is an important part of Graphium. To make configurations as reusable as possible while providing maximum flexibility, we integrated Graphium with `hydra`. Our config structure is designed to make the following functionality as accessible as possible:
+
+- Switching between **accelerators** (CPU, GPU and IPU)
+- **Benchmarking** different models on the same dataset
+- **Fine-tuning** a pre-trained model on a new dataset
+
+In what follows, we describe how each of the above functionality is achieved and how users can benefit from this design to achieve the most with Graphium with as little configuration as possible.
+
+## Accelerators
+With Graphium supporting CPU, GPU and IPU hardware, easily switching between these accelerators is pre-configured. General, accelerator-specific configs are specified under `accelerator/`, whereas experiment-specific differences between the accelerators are specialized under `training/accelerator`.
+
+## Benchmarking
+Benchmarking multiple models on the same datasets and tasks requires us to easily switch between model configurations without redefining major parts of the architecture, task heads, featurization, metrics, predictor, etc. For example, when changing from a GCN to a GIN model, a simple switch of `architecture.gnn.layer_type: 'pyg:gin'` might suffice. Hence, we abstract the `model` configs under `model/` where such model configurations can be specified.
+In addition, switching models may have implications on configs specific to your current experiment, such as the name of the run or the directory to which model checkpoints are written. To enable such overrides, we can utilize `hydra` [specializations](https://hydra.cc/docs/patterns/specializing_config/). For example, for our ToyMix dataset, we specify the layer type under `model/[model_name].yaml`, e.g., for the GCN layer,
+
+```yaml
+# @package _global_
+
+architecture:
+  gnn:
+    layer_type: 'pyg:gcn'
+```
+
+and set experiment-related parameters in `training/model/toymix_[model_name].yaml` as a specialization, e.g., for the GIN layer,
+
+```yaml
+# @package _global_
+
+constants:
+  name: neurips2023_small_data_gin
+  ...
+
+trainer:
+  model_checkpoint:
+    dirpath: models_checkpoints/neurips2023-small-gin/
+```
+We can now utilize `hydra` to e.g., run a sweep over our models on the ToyMix dataset via
+
+```bash
+python main_run_multitask.py -m model=gcn,gin
+```
+where the ToyMix dataset is pre-configured in `main.yaml`. Read on to find out how to define new datasets and architectures for pre-training and fine-tuning.
+
+## Pre-training / Fine-tuning
+From a configuration point-of-view, fine-tuning requires us to load a pre-trained model and attach new task heads. However, in a highly configurable library such as ours changing the task heads also requires changes to the logged metrics, loss functions and the source of the fine-tuning data. To allow a quick switch between pre-training and fine-tuning, by default, we configure models and the corresponding tasks in a separate manner. More specifically,
+
+- under `architecture/` we store architecture related configurations such as the definition of the GNN/Transformer layers or positional/structural encoders
+- under `tasks/` we store configurations specific to one task set, such as the multi-task dataset ToyMix
+- under `training/` we store configurations specific to training models which could be different for each combination of `architecture` and `tasks`
+
+Since architecture and tasks are logically separated it now becomes very easy to e.g., use an existing architecture backbone on a new set of tasks or a new dataset altogether. Additionally, separating training allows us to specify different training parameters for e.g., pre-training and fine-tuning of the same architecture and task set.
+We will now detail how you can add new architectures, tasks and training configurations.
+
+### Adding an architecture
+The architecture config consists of specifications of the neural network components, including encoders, under the config key `architecture` and the featurization, containing the positional/structural information that is to be extracted from the data.
+To add a new architecture, create a file `architecture/my_architecture.yaml` with the following information specified:
+```yaml
+# @package _global_
+architecture:
+  model_type: FullGraphMultiTaskNetwork # for example
+  pre_nn:
+    ...
+
+  pre_nn_edges:
+    ...
+
+  pe_encoders:
+    encoders: # your encoders
+      ...
+
+  gnn: # your GNN definition
+    ...
+
+  graph_output_nn: # output NNs for different levels such as graph, node, etc.
+    graph:
+      ...
+    node:
+      ...
+    ...
+
+datamodule:
+  module_type: "MultitaskFromSmilesDataModule"
+  args: # Make sure to not specify anything task-specific here
+    ...
+  featurization:
+    ...
+```
+You can then select your new architecture during training, e.g., by running
+```bash
+python main_run_multitask.py architecture=my_architecture
+```
+
+### Adding tasks
+The task set config consists of specifications for the task head neural nets under the config key `architecture.task_heads`; if required, any task-specific arguments to the datamodule you use, e.g., `datamodule.args.task_specfic_args` when using the `MultitaskFromSmilesDataModule` datamodule; the per-task metrics under the config key `metrics.[task]` where `[task]` matches the tasks specified under `architecture.task_heads`; the per-task configs of the `predictor` module, as well as the loss functions of the task set under the config key `predictor.loss_fun`.
+To add a new task set, create a file `tasks/my_tasks.yaml` with the following information specified:
+```yaml
+# @package _global_
+architecture:
+    task_heads:
+        task1:
+            ...
+        task2:
+            ...
+
+datamodule: # optional, depends on your concrete datamodule class. Here: "MultitaskFromSmilesDataModule"
+    args:
+        task_specific_args:
+            task1:
+                ...
+            task2:
+                ...
+
+metrics:
+    task1:
+        ...
+    task2:
+        ...
+
+predictor:
+  metrics_on_progress_bar:
+    task1:
+    task2:
+  loss_fun: ... # your loss functions for the multi-tasking
+```
+You can then select your new dataset during training, e.g., by running
+```bash
+python main_run_multitask.py tasks=my_tasks
+```
+
+### Adding training configs
+The training configs consist of specifications to the `predictor` and `trainer` modules.
+To add new training configs, create a file `training/my_training.yaml` with the following information specified:
+```yaml
+# @package _global_
+predictor:
+    optim_kwargs:
+    lr: 4.e-5
+    torch_scheduler_kwargs: # example
+        module_type: WarmUpLinearLR
+        max_num_epochs: &max_epochs 100
+        warmup_epochs: 10
+        verbose: False
+    scheduler_kwargs:
+        ...
+
+trainer:
+  ...
+  trainer: # example
+    precision: 16
+    max_epochs: *max_epochs
+    min_epochs: 1
+    check_val_every_n_epoch: 20
+```
@@ -9,9 +9,9 @@ architecture:
     depth: 2
     activation: relu
     last_activation: none
-    dropout: &dropout 0.18
-    normalization: &normalization layer_norm
-    last_normalization: *normalization
+    dropout: 0.18
+    normalization: layer_norm
+    last_normalization: ${architecture.pre_nn.normalization}
     residual_type: none
 
   pre_nn_edges: null
@@ -52,7 +52,7 @@ architecture:
     last_activation: none
     dropout: 0.1
     normalization: "layer_norm"
-    last_normalization: *normalization
+    last_normalization: ${architecture.pre_nn.normalization}
     residual_type: simple
     virtual_node: 'none'
     layer_type: 'pyg:gcn' #pyg:gine #'pyg:gps' # pyg:gated-gcn, pyg:gine,pyg:gps
@@ -66,130 +66,11 @@ architecture:
       depth: 1
       activation: relu
       last_activation: none
-      dropout: *dropout
-      normalization: *normalization
+      dropout: ${architecture.pre_nn.dropout}
+      normalization: ${architecture.pre_nn.normalization}
       last_normalization: "none"
       residual_type: none
 
-  task_heads:
-    qm9:
-      task_level: graph
-      out_dim: 19
-      hidden_dims: 128
-      depth: 2
-      activation: relu
-      last_activation: none
-      dropout: *dropout
-      normalization: *normalization
-      last_normalization: "none"
-      residual_type: none
-    tox21:
-      task_level: graph
-      out_dim: 12
-      hidden_dims: 64
-      depth: 2
-      activation: relu
-      last_activation: sigmoid
-      dropout: *dropout
-      normalization: *normalization
-      last_normalization: "none"
-      residual_type: none
-    zinc:
-      task_level: graph
-      out_dim: 3
-      hidden_dims: 32
-      depth: 2
-      activation: relu
-      last_activation: none
-      dropout: *dropout
-      normalization: *normalization
-      last_normalization: "none"
-      residual_type: none
-
-predictor:
-  metrics_on_progress_bar:
-    qm9: ["mae"]
-    tox21: ["auroc"]
-    zinc: ["mae"]
-  loss_fun:
-    qm9: mae_ipu
-    tox21: bce_ipu
-    zinc: mae_ipu
-  random_seed: ${constants.seed}
-  optim_kwargs:
-    lr: 4.e-5 # warmup can be scheduled using torch_scheduler_kwargs
-    # weight_decay: 1.e-7
-  torch_scheduler_kwargs:
-    module_type: WarmUpLinearLR
-    max_num_epochs: &max_epochs 100
-    warmup_epochs: 10
-    verbose: False
-  scheduler_kwargs:
-  target_nan_mask: null
-  multitask_handling: flatten # flatten, mean-per-label
-
-metrics:
-  qm9: &qm9_metrics
-    - name: mae
-      metric: mae_ipu
-      target_nan_mask: null
-      multitask_handling: flatten
-      threshold_kwargs: null
-    - name: pearsonr
-      metric: pearsonr_ipu
-      threshold_kwargs: null
-      target_nan_mask: null
-      multitask_handling: mean-per-label
-    - name: r2_score
-      metric: r2_score_ipu
-      target_nan_mask: null
-      multitask_handling: mean-per-label
-      threshold_kwargs: null
-  tox21:
-    - name: auroc
-      metric: auroc_ipu
-      task: binary
-      multitask_handling: mean-per-label
-      threshold_kwargs: null
-    - name: avpr
-      metric: average_precision_ipu
-      task: binary
-      multitask_handling: mean-per-label
-      threshold_kwargs: null
-    - name: f1 > 0.5
-      metric: f1
-      multitask_handling: mean-per-label
-      target_to_int: True
-      num_classes: 2
-      average: micro
-      threshold_kwargs: &threshold_05
-        operator: greater
-        threshold: 0.5
-        th_on_preds: True
-        th_on_target: True
-    - name: precision > 0.5
-      metric: precision
-      multitask_handling: mean-per-label
-      average: micro
-      threshold_kwargs: *threshold_05
-  zinc: *qm9_metrics
-
-trainer:
-  seed: ${constants.seed}
-  logger:
-    save_dir: logs/neurips2023-small/
-    name: ${constants.name}
-    project: ${constants.name}
-  model_checkpoint:
-    dirpath: models_checkpoints/neurips2023-small-gcn/
-    filename: ${constants.name}
-    save_last: True
-  trainer:
-    precision: 16
-    max_epochs: *max_epochs
-    min_epochs: 1
-    check_val_every_n_epoch: 20
-
 datamodule:
   module_type: "MultitaskFromSmilesDataModule"
   args:
@@ -266,4 +147,4 @@ datamodule:
           rw_pos: # use same name as pe_encoder
             pos_level: node
             pos_type: rw_return_probs
-            ksteps: 16
+            ksteps: 16
@@ -1,8 +1,16 @@
 defaults:
+
+  # Accelerators
   - accelerator: ipu
-  - dataset: toymix
+
+  # Pre-training/fine-tuning
+  - architecture: toymix
+  - tasks: toymix
+  - training: toymix
+
+  # Benchmarking
   - model: gcn
 
   # Specializations
-  - experiment: ${dataset}_${model}
-  - dataset/accelerator: ${dataset}_${accelerator}
+  - training/accelerator: ${training}_${accelerator}
+  - training/model: ${training}_${model}