Skip to content

Commit e6118a2

Browse files
committed
Initial draft for the v2 configs
1 parent b0d4fd5 commit e6118a2

File tree

10 files changed

+297
-145
lines changed

10 files changed

+297
-145
lines changed

expts/hydra-configs/README.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Configuring Graphium with Hydra
2+
This document provides users with a point of entry to composing configs in Graphium. As a flexible library with many features, configuration is an important part of Graphium. To make configurations as reusable as possible while providing maximum flexibility, we integrated Graphium with `hydra`. Our config structure is designed to make the following functionality as accessible as possible:
3+
4+
- Switching between **accelerators** (CPU, GPU and IPU)
5+
- **Benchmarking** different models on the same dataset
6+
- **Fine-tuning** a pre-trained model on a new dataset
7+
8+
In what follows, we describe how each of the above functionality is achieved and how users can benefit from this design to achieve the most with Graphium with as little configuration as possible.
9+
10+
## Accelerators
11+
With Graphium supporting CPU, GPU and IPU hardware, easily switching between these accelerators is pre-configured. General, accelerator-specific configs are specified under `accelerator/`, whereas experiment-specific differences between the accelerators are specialized under `training/accelerator`.
12+
13+
## Benchmarking
14+
Benchmarking multiple models on the same datasets and tasks requires us to easily switch between model configurations without redefining major parts of the architecture, task heads, featurization, metrics, predictor, etc. For example, when changing from a GCN to a GIN model, a simple switch of `architecture.gnn.layer_type: 'pyg:gin'` might suffice. Hence, we abstract the `model` configs under `model/` where such model configurations can be specified.
15+
In addition, switching models may have implications on configs specific to your current experiment, such as the name of the run or the directory to which model checkpoints are written. To enable such overrides, we can utilize `hydra` [specializations](https://hydra.cc/docs/patterns/specializing_config/). For example, for our ToyMix dataset, we specify the layer type under `model/[model_name].yaml`, e.g., for the GCN layer,
16+
17+
```yaml
18+
# @package _global_
19+
20+
architecture:
21+
gnn:
22+
layer_type: 'pyg:gcn'
23+
```
24+
25+
and set experiment-related parameters in `training/model/toymix_[model_name].yaml` as a specialization, e.g., for the GIN layer,
26+
27+
```yaml
28+
# @package _global_
29+
30+
constants:
31+
name: neurips2023_small_data_gin
32+
...
33+
34+
trainer:
35+
model_checkpoint:
36+
dirpath: models_checkpoints/neurips2023-small-gin/
37+
```
38+
We can now utilize `hydra` to e.g., run a sweep over our models on the ToyMix dataset via
39+
40+
```bash
41+
python main_run_multitask.py -m model=gcn,gin
42+
```
43+
where the ToyMix dataset is pre-configured in `main.yaml`. Read on to find out how to define new datasets and architectures for pre-training and fine-tuning.
44+
45+
## Pre-training / Fine-tuning
46+
From a configuration point-of-view, fine-tuning requires us to load a pre-trained model and attach new task heads. However, in a highly configurable library such as ours changing the task heads also requires changes to the logged metrics, loss functions and the source of the fine-tuning data. To allow a quick switch between pre-training and fine-tuning, by default, we configure models and the corresponding tasks in a separate manner. More specifically,
47+
48+
- under `architecture/` we store architecture related configurations such as the definition of the GNN/Transformer layers or positional/structural encoders
49+
- under `tasks/` we store configurations specific to one task set, such as the multi-task dataset ToyMix
50+
- under `training/` we store configurations specific to training models which could be different for each combination of `architecture` and `tasks`
51+
52+
Since architecture and tasks are logically separated it now becomes very easy to e.g., use an existing architecture backbone on a new set of tasks or a new dataset altogether. Additionally, separating training allows us to specify different training parameters for e.g., pre-training and fine-tuning of the same architecture and task set.
53+
We will now detail how you can add new architectures, tasks and training configurations.
54+
55+
### Adding an architecture
56+
The architecture config consists of specifications of the neural network components, including encoders, under the config key `architecture` and the featurization, containing the positional/structural information that is to be extracted from the data.
57+
To add a new architecture, create a file `architecture/my_architecture.yaml` with the following information specified:
58+
```yaml
59+
# @package _global_
60+
architecture:
61+
model_type: FullGraphMultiTaskNetwork # for example
62+
pre_nn:
63+
...
64+
65+
pre_nn_edges:
66+
...
67+
68+
pe_encoders:
69+
encoders: # your encoders
70+
...
71+
72+
gnn: # your GNN definition
73+
...
74+
75+
graph_output_nn: # output NNs for different levels such as graph, node, etc.
76+
graph:
77+
...
78+
node:
79+
...
80+
...
81+
82+
datamodule:
83+
module_type: "MultitaskFromSmilesDataModule"
84+
args: # Make sure to not specify anything task-specific here
85+
...
86+
featurization:
87+
...
88+
```
89+
You can then select your new architecture during training, e.g., by running
90+
```bash
91+
python main_run_multitask.py architecture=my_architecture
92+
```
93+
94+
### Adding tasks
95+
The task set config consists of specifications for the task head neural nets under the config key `architecture.task_heads`; if required, any task-specific arguments to the datamodule you use, e.g., `datamodule.args.task_specfic_args` when using the `MultitaskFromSmilesDataModule` datamodule; the per-task metrics under the config key `metrics.[task]` where `[task]` matches the tasks specified under `architecture.task_heads`; the per-task configs of the `predictor` module, as well as the loss functions of the task set under the config key `predictor.loss_fun`.
96+
To add a new task set, create a file `tasks/my_tasks.yaml` with the following information specified:
97+
```yaml
98+
# @package _global_
99+
architecture:
100+
task_heads:
101+
task1:
102+
...
103+
task2:
104+
...
105+
106+
datamodule: # optional, depends on your concrete datamodule class. Here: "MultitaskFromSmilesDataModule"
107+
args:
108+
task_specific_args:
109+
task1:
110+
...
111+
task2:
112+
...
113+
114+
metrics:
115+
task1:
116+
...
117+
task2:
118+
...
119+
120+
predictor:
121+
metrics_on_progress_bar:
122+
task1:
123+
task2:
124+
loss_fun: ... # your loss functions for the multi-tasking
125+
```
126+
You can then select your new dataset during training, e.g., by running
127+
```bash
128+
python main_run_multitask.py tasks=my_tasks
129+
```
130+
131+
### Adding training configs
132+
The training configs consist of specifications to the `predictor` and `trainer` modules.
133+
To add new training configs, create a file `training/my_training.yaml` with the following information specified:
134+
```yaml
135+
# @package _global_
136+
predictor:
137+
optim_kwargs:
138+
lr: 4.e-5
139+
torch_scheduler_kwargs: # example
140+
module_type: WarmUpLinearLR
141+
max_num_epochs: &max_epochs 100
142+
warmup_epochs: 10
143+
verbose: False
144+
scheduler_kwargs:
145+
...
146+
147+
trainer:
148+
...
149+
trainer: # example
150+
precision: 16
151+
max_epochs: *max_epochs
152+
min_epochs: 1
153+
check_val_every_n_epoch: 20
154+
```

expts/hydra-configs/dataset/toymix.yaml renamed to expts/hydra-configs/architecture/toymix.yaml

Lines changed: 7 additions & 126 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ architecture:
99
depth: 2
1010
activation: relu
1111
last_activation: none
12-
dropout: &dropout 0.18
13-
normalization: &normalization layer_norm
14-
last_normalization: *normalization
12+
dropout: 0.18
13+
normalization: layer_norm
14+
last_normalization: ${architecture.pre_nn.normalization}
1515
residual_type: none
1616

1717
pre_nn_edges: null
@@ -52,7 +52,7 @@ architecture:
5252
last_activation: none
5353
dropout: 0.1
5454
normalization: "layer_norm"
55-
last_normalization: *normalization
55+
last_normalization: ${architecture.pre_nn.normalization}
5656
residual_type: simple
5757
virtual_node: 'none'
5858
layer_type: 'pyg:gcn' #pyg:gine #'pyg:gps' # pyg:gated-gcn, pyg:gine,pyg:gps
@@ -66,130 +66,11 @@ architecture:
6666
depth: 1
6767
activation: relu
6868
last_activation: none
69-
dropout: *dropout
70-
normalization: *normalization
69+
dropout: ${architecture.pre_nn.dropout}
70+
normalization: ${architecture.pre_nn.normalization}
7171
last_normalization: "none"
7272
residual_type: none
7373

74-
task_heads:
75-
qm9:
76-
task_level: graph
77-
out_dim: 19
78-
hidden_dims: 128
79-
depth: 2
80-
activation: relu
81-
last_activation: none
82-
dropout: *dropout
83-
normalization: *normalization
84-
last_normalization: "none"
85-
residual_type: none
86-
tox21:
87-
task_level: graph
88-
out_dim: 12
89-
hidden_dims: 64
90-
depth: 2
91-
activation: relu
92-
last_activation: sigmoid
93-
dropout: *dropout
94-
normalization: *normalization
95-
last_normalization: "none"
96-
residual_type: none
97-
zinc:
98-
task_level: graph
99-
out_dim: 3
100-
hidden_dims: 32
101-
depth: 2
102-
activation: relu
103-
last_activation: none
104-
dropout: *dropout
105-
normalization: *normalization
106-
last_normalization: "none"
107-
residual_type: none
108-
109-
predictor:
110-
metrics_on_progress_bar:
111-
qm9: ["mae"]
112-
tox21: ["auroc"]
113-
zinc: ["mae"]
114-
loss_fun:
115-
qm9: mae_ipu
116-
tox21: bce_ipu
117-
zinc: mae_ipu
118-
random_seed: ${constants.seed}
119-
optim_kwargs:
120-
lr: 4.e-5 # warmup can be scheduled using torch_scheduler_kwargs
121-
# weight_decay: 1.e-7
122-
torch_scheduler_kwargs:
123-
module_type: WarmUpLinearLR
124-
max_num_epochs: &max_epochs 100
125-
warmup_epochs: 10
126-
verbose: False
127-
scheduler_kwargs:
128-
target_nan_mask: null
129-
multitask_handling: flatten # flatten, mean-per-label
130-
131-
metrics:
132-
qm9: &qm9_metrics
133-
- name: mae
134-
metric: mae_ipu
135-
target_nan_mask: null
136-
multitask_handling: flatten
137-
threshold_kwargs: null
138-
- name: pearsonr
139-
metric: pearsonr_ipu
140-
threshold_kwargs: null
141-
target_nan_mask: null
142-
multitask_handling: mean-per-label
143-
- name: r2_score
144-
metric: r2_score_ipu
145-
target_nan_mask: null
146-
multitask_handling: mean-per-label
147-
threshold_kwargs: null
148-
tox21:
149-
- name: auroc
150-
metric: auroc_ipu
151-
task: binary
152-
multitask_handling: mean-per-label
153-
threshold_kwargs: null
154-
- name: avpr
155-
metric: average_precision_ipu
156-
task: binary
157-
multitask_handling: mean-per-label
158-
threshold_kwargs: null
159-
- name: f1 > 0.5
160-
metric: f1
161-
multitask_handling: mean-per-label
162-
target_to_int: True
163-
num_classes: 2
164-
average: micro
165-
threshold_kwargs: &threshold_05
166-
operator: greater
167-
threshold: 0.5
168-
th_on_preds: True
169-
th_on_target: True
170-
- name: precision > 0.5
171-
metric: precision
172-
multitask_handling: mean-per-label
173-
average: micro
174-
threshold_kwargs: *threshold_05
175-
zinc: *qm9_metrics
176-
177-
trainer:
178-
seed: ${constants.seed}
179-
logger:
180-
save_dir: logs/neurips2023-small/
181-
name: ${constants.name}
182-
project: ${constants.name}
183-
model_checkpoint:
184-
dirpath: models_checkpoints/neurips2023-small-gcn/
185-
filename: ${constants.name}
186-
save_last: True
187-
trainer:
188-
precision: 16
189-
max_epochs: *max_epochs
190-
min_epochs: 1
191-
check_val_every_n_epoch: 20
192-
19374
datamodule:
19475
module_type: "MultitaskFromSmilesDataModule"
19576
args:
@@ -266,4 +147,4 @@ datamodule:
266147
rw_pos: # use same name as pe_encoder
267148
pos_level: node
268149
pos_type: rw_return_probs
269-
ksteps: 16
150+
ksteps: 16

expts/hydra-configs/main.yaml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
11
defaults:
2+
3+
# Accelerators
24
- accelerator: ipu
3-
- dataset: toymix
5+
6+
# Pre-training/fine-tuning
7+
- architecture: toymix
8+
- tasks: toymix
9+
- training: toymix
10+
11+
# Benchmarking
412
- model: gcn
513

614
# Specializations
7-
- experiment: ${dataset}_${model}
8-
- dataset/accelerator: ${dataset}_${accelerator}
15+
- training/accelerator: ${training}_${accelerator}
16+
- training/model: ${training}_${model}

0 commit comments

Comments
 (0)