Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging configs #362

Merged
merged 64 commits into from
Jun 23, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
71df904
[skip ci] Update README.md
hadim Jun 7, 2023
447e6fd
[skip ci] Update README.md
hadim Jun 7, 2023
6753918
add validation file and change some configs
Jun 7, 2023
e1dbb86
ckpt option changes
Jun 7, 2023
14c50ef
pass n_brakets through ipu losses
Jun 7, 2023
8bd783d
Small PR for removing the Hydrogen molecules from a dataset - regardl…
s-maddrellmander Jun 7, 2023
9cd1cd9
Merge branch 'main' of github.com:datamol-io/graphium into removing_h…
s-maddrellmander Jun 7, 2023
3ff7b77
Merge branch 'main' of github.com:datamol-io/graphium into removing_h…
s-maddrellmander Jun 8, 2023
22a1708
Merge branch 'removing_hydrogens' into zhiyil/some_changes
s-maddrellmander Jun 8, 2023
0dabb88
add baseline configs
Jun 8, 2023
87e47cd
Merge branch 'zhiyil/some_changes' of github.com:datamol-io/graphium …
s-maddrellmander Jun 8, 2023
76fb562
config minor modification
Jun 8, 2023
62da9c4
Merge branch 'zhiyil/some_changes' of github.com:datamol-io/graphium …
s-maddrellmander Jun 8, 2023
c2f5e7b
changes on loss_fun
Jun 8, 2023
0dadd23
Added the save_on_train_end to all neurips configs
s-maddrellmander Jun 8, 2023
6cbb921
Merge branch 'zhiyil/some_changes' of github.com:datamol-io/graphium …
s-maddrellmander Jun 8, 2023
1c49e7b
Some debugging lines to help see where issues are arising
s-maddrellmander Jun 9, 2023
d00dd3b
Adding the seed as an extra directory for checkpoints
s-maddrellmander Jun 9, 2023
4876b71
fix for normalization
Jun 9, 2023
9ebd705
remove lru cache
Jun 9, 2023
f1f242b
Only save the last checkpoint
s-maddrellmander Jun 9, 2023
960b95f
add debug small config, minor change on ckpt
Jun 9, 2023
cb1f815
add small dataset debug config
Jun 9, 2023
a66795a
Merge branch 'zhiyil/some_changes' of github.com:datamol-io/graphium …
s-maddrellmander Jun 9, 2023
0c0ec2a
Revert "add debug small config, minor change on ckpt"
Jun 9, 2023
0cf9964
Merge branch 'zhiyil/some_changes' of https://github.com/datamol-io/g…
Jun 9, 2023
bbebf46
some changes to metrics + loss for test?
s-maddrellmander Jun 9, 2023
2b613c2
Merge branch 'zhiyil/some_changes' of https://github.com/datamol-io/g…
Jun 10, 2023
210e434
add label cast for l1000 vcap
Jun 10, 2023
835baf0
config changes
Jun 11, 2023
1f8bbca
config changes
Jun 11, 2023
9d28cd0
config changes
Jun 11, 2023
0846316
increase max nodes and edges
Jun 11, 2023
f10b98b
change to 40 epochs for large mix
Jun 11, 2023
12abad4
reduce multi element tensors
joao-alex-cunha Jun 12, 2023
8558e45
small dataset modification
Jun 12, 2023
97ba620
metrics modification and add metric_on_training_set to large configs
Jun 12, 2023
657bfba
Merge branch 'zhiyil/some_changes' of https://github.com/datamol-io/g…
Jun 12, 2023
572e817
changes on default lr
Jun 12, 2023
ca702ef
Sam's config changes
s-maddrellmander Jun 12, 2023
5705725
Merge branch 'zhiyil/some_changes' of github.com:datamol-io/graphium …
s-maddrellmander Jun 12, 2023
5df2e05
add function to do normalization on val and test
Jun 12, 2023
885dff2
small fix to normalize_val_test
Jun 12, 2023
5761bb2
add gine single task configs
Jun 12, 2023
f152d13
Changes to run validation to parse metrics via the commadn line as fu…
s-maddrellmander Jun 12, 2023
126320f
Merge branch 'zhiyil/some_changes' of github.com:datamol-io/graphium …
s-maddrellmander Jun 12, 2023
20f0c86
avoid preds and targets transfer back to the ipu in training
joao-alex-cunha Jun 13, 2023
23b216c
remove sys exit
joao-alex-cunha Jun 13, 2023
1d272bf
Revert "remove sys exit"
joao-alex-cunha Jun 13, 2023
b2ba8a0
Revert "avoid preds and targets transfer back to the ipu in training"
joao-alex-cunha Jun 13, 2023
d506c25
no metrics during training
joao-alex-cunha Jun 13, 2023
182194f
Revert "no metrics during training"
s-maddrellmander Jun 13, 2023
dc55c9a
Baseline config changes to run properly with metrics etc.
s-maddrellmander Jun 13, 2023
5ccd36d
Revert "Revert "no metrics during training""
Jun 13, 2023
429343e
add pcq config and test
Jun 13, 2023
909dfae
modify device iterations
Jun 13, 2023
e9a7beb
save config changes
Jun 13, 2023
41a5d49
some config changes
Jun 13, 2023
dcf8431
add some configs
Jun 14, 2023
422bd98
remove hacky changes and arrange script
Jun 16, 2023
dc95e80
remove more hacky changes
Jun 16, 2023
5c0502b
change metrics
Jun 16, 2023
acd31d1
merging configs with `config_override`
DomInvivo Jun 19, 2023
6d9b8b0
make minor changes
Jun 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
some config changes
  • Loading branch information
zhiyil-graphcore committed Jun 13, 2023
commit 41a5d493b86962a5049efc87c7871faf9a06988d
11 changes: 9 additions & 2 deletions expts/neurips2023_configs/config_large_gcn.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ accelerator:
loss_scaling: 1024
trainer:
trainer:
precision: 16
precision: 32
accumulate_grad_batches: 8

ipu_config:
Expand Down Expand Up @@ -97,6 +97,7 @@ datamodule:
task_level: graph
splits_path: graphium/data/neurips2023/large-dataset/pcqm4m_g25_n4_random_splits.pt # Download with `wget https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/pcqm4m_g25_n4_random_splits.pt`
label_normalization:
normalize_val_test: True
method: "normal"

pcqm4m_n4:
Expand All @@ -111,14 +112,15 @@ datamodule:
splits_path: graphium/data/neurips2023/large-dataset/pcqm4m_g25_n4_random_splits.pt # Download with `wget https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/pcqm4m_g25_n4_random_splits.pt`
seed: *seed
label_normalization:
normalize_val_test: True
method: "normal"

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: "../neurips2023-large/"
processed_graph_data_path: "/net/group/software-apps/datacache/neurips2023-large/"
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
# 'possible_number_radical_e', 'possible_is_aromatic', 'possible_is_in_ring',
Expand Down Expand Up @@ -379,6 +381,11 @@ metrics:
threshold_kwargs: null
target_nan_mask: null
multitask_handling: mean-per-label
- name: r2
metric: r2_score_ipu
threshold_kwargs: null
target_nan_mask: null
multitask_handling: mean-per-label
pcqm4m_n4: *pcqm_metrics

trainer:
Expand Down
15 changes: 11 additions & 4 deletions expts/neurips2023_configs/config_large_gin.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ accelerator:
args:
ipu_dataloader_training_opts:
mode: async
max_num_nodes_per_graph: 45 # train max nodes: 20, max_edges: 54
max_num_edges_per_graph: 95
max_num_nodes_per_graph: 60 # train max nodes: 20, max_edges: 54
max_num_edges_per_graph: 100
ipu_dataloader_inference_opts:
mode: async
max_num_nodes_per_graph: 50 # valid max nodes: 51, max_edges: 118
max_num_nodes_per_graph: 60 # valid max nodes: 51, max_edges: 118
max_num_edges_per_graph: 100
# Data handling-related
batch_size_training: 10
Expand Down Expand Up @@ -97,6 +97,7 @@ datamodule:
task_level: graph
splits_path: graphium/data/neurips2023/large-dataset/pcqm4m_g25_n4_random_splits.pt # Download with `wget https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/pcqm4m_g25_n4_random_splits.pt`
label_normalization:
normalize_val_test: True
method: "normal"

pcqm4m_n4:
Expand All @@ -111,14 +112,15 @@ datamodule:
splits_path: graphium/data/neurips2023/large-dataset/pcqm4m_g25_n4_random_splits.pt # Download with `wget https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/pcqm4m_g25_n4_random_splits.pt`
seed: *seed
label_normalization:
normalize_val_test: True
method: "normal"

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: "../neurips2023-large/"
processed_graph_data_path: "/net/group/software-apps/datacache/neurips2023-large/"
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
# 'possible_number_radical_e', 'possible_is_aromatic', 'possible_is_in_ring',
Expand Down Expand Up @@ -373,6 +375,11 @@ metrics:
threshold_kwargs: null
target_nan_mask: null
multitask_handling: mean-per-label
- name: r2
metric: r2_score_ipu
threshold_kwargs: null
target_nan_mask: null
multitask_handling: mean-per-label
pcqm4m_n4: *pcqm_metrics

trainer:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ accelerator:
max_num_edges_per_graph: 100
ipu_dataloader_inference_opts:
mode: async
max_num_nodes_per_graph: 60 # valid max nodes: 51, max_edges: 118
max_num_edges_per_graph: 100
max_num_nodes_per_graph: 90 # valid max nodes: 51, max_edges: 118
max_num_edges_per_graph: 300
# Data handling-related
batch_size_training: 10
batch_size_inference: 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ accelerator:
max_num_edges_per_graph: 100
# Data handling-related
batch_size_training: 10
batch_size_inference: 2
batch_size_inference: 10
predictor:
optim_kwargs:
loss_scaling: 1024
Expand Down
30 changes: 15 additions & 15 deletions graphium/trainer/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,8 +347,8 @@ def _general_step(self, batch: Dict[str, Tensor], step_name: str, to_cpu: bool)
# if normalize_val_test is true, no denormalization is applied, all losses and metrics are normalized version
preds[task] = task_specific_norm.denormalize(preds[task])
targets_dict[task] = task_specific_norm.denormalize(targets_dict[task])
preds[task] = preds[task].clone().detach().to(device=device)
targets_dict[task] = targets_dict[task].clone().detach().to(device=device)
preds[task] = preds[task].detach().to(device=device)
targets_dict[task] = targets_dict[task].detach().to(device=device)
if weights is not None:
weights = weights.detach().to(device=device)

Expand Down Expand Up @@ -447,17 +447,17 @@ def on_train_batch_end(self, outputs, batch: Any, batch_idx: int) -> None:
tput = num_graphs / train_batch_time

# this code is likely repeated for validation and testing, this should be moved to a function
# self.task_epoch_summary.update_predictor_state(
# step_name="train",
# targets=outputs["targets"],
# predictions=outputs["preds"],
# loss=outputs["loss"], # This is the weighted loss for now, but change to task-specific loss
# task_losses=outputs["task_losses"],
# n_epochs=self.current_epoch,
# )
# metrics_logs = self.task_epoch_summary.get_metrics_logs() # Dict[task, metric_logs]
# metrics_logs["_global"]["grad_norm"] = self.get_gradient_norm()
# outputs.update(metrics_logs) # Dict[task, metric_logs]. Concatenate them?
self.task_epoch_summary.update_predictor_state(
step_name="train",
targets=outputs["targets"],
predictions=outputs["preds"],
loss=outputs["loss"], # This is the weighted loss for now, but change to task-specific loss
task_losses=outputs["task_losses"],
n_epochs=self.current_epoch,
)
metrics_logs = self.task_epoch_summary.get_metrics_logs() # Dict[task, metric_logs]
metrics_logs["_global"]["grad_norm"] = self.get_gradient_norm()
outputs.update(metrics_logs) # Dict[task, metric_logs]. Concatenate them?

concatenated_metrics_logs = {} # self.task_epoch_summary.concatenate_metrics_logs(metrics_logs)
concatenated_metrics_logs["loss"] = outputs["loss"]
Expand Down Expand Up @@ -487,8 +487,8 @@ def training_step(self, batch: Dict[str, Tensor], to_cpu: bool = True) -> Dict[s
# step_dict = self._general_step(batch=batch, step_name="train", to_cpu=True)
step_dict = self._general_step(batch=batch, step_name="train", to_cpu=to_cpu)

step_dict.pop("preds")
step_dict.pop("targets")
# step_dict.pop("preds")
# step_dict.pop("targets")

return step_dict # Returning the metrics_logs with the loss

Expand Down