Finetuning pipeline #414

WenkelF · 2023-07-21T20:48:07Z

@DomInvivo as discussed a first draft for the Finetuning pipeline.

Two possible pipelines:
expts/main_run_finetuning_v1.py (probably to be removed):

Start with pretrained model
Remove redundant task heads
Drop/add layers of model part to be finetuned
Drawback: Difficult updating of hyperparameters

expts/main_run_finetuning_v2.py:

Initialize new model with final finetuning architecture
Overwrite parameters that are shared with pretrained model

Main TODOs:

Use hydra to easy finetuning config
Create unit test with dummy pretrained model

codecov · 2023-07-23T12:50:12Z

Codecov Report

Merging #414 (6625e0c) into main (9515ad6) will decrease coverage by 2.14%.
The diff coverage is 20.51%.

@@            Coverage Diff             @@
##             main     #414      +/-   ##
==========================================
- Coverage   66.87%   64.74%   -2.14%     
==========================================
  Files          82       89       +7     
  Lines        7838     8211     +373     
==========================================
+ Hits         5242     5316      +74     
- Misses       2596     2895     +299

Flag	Coverage Δ
unittests	`64.74% <20.51%> (-2.14%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
ipu	`49.14% <ø> (ø)`

DomInvivo

Here's my review. It mostly aligns with what we discussed last time

expts/finetune_configs/config_toy_finetuning_v1.yaml

expts/finetune_configs/config_toy_finetuning_v2.yaml

graphium/finetuning/finetuning.py

DomInvivo · 2023-07-23T13:30:41Z

graphium/finetuning/utils.py

+from copy import deepcopy
+
+
+def modify_cfg_for_finetuning(cfg):


This could be a function within FeedForwardNN and FullGraphNetwork

I am not sure, we can remove the function and replace inside the networks. As discussed in that #411, it might be possible to have a similar function within load_architecture in graphium/data/_loader.py but depends on the PR.

graphium/nn/architectures/global_architectures.py

Explore finetuning

cwognum

Hi @WenkelF, sorry for the delay here!

Thank you for this first implementation!

I left comments whenever something stood out to me, but I'm aware that this PR is still WIP. Sorry if I pointed out some things that you were already planning to change.

It would be super helpful if you could document the main fine-tuning "flow". Furthermore, I would suggest to simplify the process by adding support for one feature at a time, instead of having half-implemented features. This will make understanding, debugging, testing and maintaining the code base a lot easier.

Happy to help next week!

expts/main_run_finetuning.py

graphium/nn/architectures/global_architectures.py

graphium/finetuning/utils.py

graphium/nn/architectures/global_architectures.py

…FullGraphFinetuningNetwork

…tput_nn

…ut_nn and gnn; addressing comments

DomInvivo

Don't forget that the objective is to release a working, incomplete version first. Then refining it with more complex fine-tuning possibilities.

DomInvivo · 2023-07-29T19:05:21Z

expts/hydra-configs/dataset/lipophilicity.yaml

+    qm9:
+      task_level: graph
+      out_dim: 19
+      hidden_dims: 128
+      depth: 2
+      activation: relu
+      last_activation: none
+      dropout: *dropout
+      normalization: *normalization
+      last_normalization: "none"
+      residual_type: none
+    tox21:
+      task_level: graph
+      out_dim: 12
+      hidden_dims: 64
+      depth: 2
+      activation: relu
+      last_activation: sigmoid
+      dropout: *dropout
+      normalization: *normalization
+      last_normalization: "none"
+      residual_type: none
+    zinc:
+      task_level: graph
+      out_dim: 3
+      hidden_dims: 32
+      depth: 2
+      activation: relu
+      last_activation: none


This doesn't make sense. We should not have any architectural choice from the original pre-trained model in here. Only things that would change.

That way, we can take different pre-trained models that have different hparams/seed and fine-tune them all with the same file.

I agree. The configurations are still structured in a way where we have access to both the full config of the pretrained model and the pretraining-related config. And the modify_cfg_for_finetuning function consolidates information to one config.

This will be fixed once we incorporate the new hydra config from #421. We will still need modify_cfg_for_finetuning as of now. Therefore, it could be good waiting for the final version.

expts/main_run_finetuning.py

DomInvivo · 2023-07-29T19:14:02Z

graphium/data/datamodule.py

+            try:
+                if "epoch_sampling_fraction" in args[task].keys():
+                    args[task].pop("epoch_sampling_fraction")
+            except:
+                pass


I don't even understand the point of having a try and an if there. Dict.pop works even if the key is not available.

But we need to make sure that args[task] is not used elsewhere, even outside the current function since dict are passed as pointers. We only want to remove epoch_sampling for the hash key. So I would suggest the following.

Suggested change

try:

if "epoch_sampling_fraction" in args[task].keys():

args[task].pop("epoch_sampling_fraction")

except:

pass

args[task] = deepcopy(args[task])

args[task].pop("epoch_sampling_fraction")

Indeed, this is not an ideal fix. I wanted to investigate the issue a bit more.

We cannot use args[task].pop("epoch_sampling_fraction") because args[task] may be of class DatasetProcessingParams instead of Dict. In particular, ADMETBenchmarkDataModule makes use of DatasetProcessingParams.

The error originates from changes here 4b82ba3, where the line args[task].pop("epoch_sampling_fraction") was added. It did not cause errors back then because we were using a Dict in all configs (although I see a comment # To be replaced by a new class "DatasetParams" everywhere it appears).

Will create issue and think about a fix.

graphium/data/datamodule.py

DomInvivo · 2023-07-29T19:16:12Z

graphium/finetuning/finetuning.py

+
+
+class GraphFinetuning(BaseFinetuning):
+    def __init__(self, cfg, train_bn: bool = False):


Change to explicitly pass parameters.

Suggested change

def __init__(self, cfg, train_bn: bool = False):

def __init__(self, fine-tuning, architecture, module_from_pretrained, ....................., train_bn: bool = False):

Addressed in 76e2ba6

DomInvivo · 2023-07-29T19:20:29Z

graphium/finetuning/finetuning.py

+            modules = pl_module.model.task_heads.graph_output_nn
+        elif module == "task_heads":
+            modules = pl_module.model.task_heads.task_heads
+


else: raise "Wrong module"

Addressed in 76e2ba6

DomInvivo · 2023-07-29T19:22:47Z

graphium/finetuning/finetuning.py

+        if module == "pe_encoders":
+            modules = pl_module.model.encoder_manager
+        elif module == "pre_nn":
+            modules = pl_module.model.pre_nn
+        elif module == "pre_nn_edges":
+            modules = pl_module.model.pre_nn_edges
+        elif module == "gnn":
+            modules = pl_module.model.gnn
+        elif module == "graph_output_nn":
+            modules = pl_module.model.task_heads.graph_output_nn
+        elif module == "task_heads":
+            modules = pl_module.model.task_heads.task_heads


I would define all these in a dictionary _module_map = {pe_encoders: pl_module.model.encoder_manager, ...} directly in the __init__. That way, with inheritance, someone could modified the entries without copy-pasting all the logic.

_module_map can replace the module_list you already have.

But instead of a regular dict, using an OrderedDict would also allow you to say something like: "freeze everything before task_heads" in a very simple way.

DomInvivo · 2023-07-29T19:27:07Z

graphium/finetuning/finetuning_architecture.py

It's a bad idea to have a FullGraphFinetuningNetwork that basically copy-pastes most of the functionality of FullGraphNetwork.

Either use inheritance, or implement the fine-tuning logic directly within FullGraphNetwork

graphium/nn/architectures/global_architectures.py

WenkelF · 2023-07-29T20:12:59Z

@DomInvivo thanks for your comments. Here is also a quick overview of the updates

Updates:

Moved finetuning-related architecture out of FullGraphMultitaskNetwork to FullGraphFinetuningNetwork under graphium/finetuning/finetuning_architecture
- More flexibility to build finetuning head on top of pretrained model
- For now, finetuning is specific to FullGraphMultitaskNetwork (as pretrained model), but we can finetune from gnn, graph_output_nn and task_heads now
- Additionally, we can add a flexible FinetuningHead, inheriting from nn.Module and MupMixin. For example, it can be a FeedForwardNN, FeedForwardPyg, TaskHeads, or a custom network that can be added to FINETUNING_HEAD_DICT
- FinetuningHead needs a bit more work to correctly support all possible scenarios (modules/levels to finetune from); if not used, we automatically fall back to the finetuning logic, where we can only drop & add depth to the module we finetune from
Finished functions for overwriting shared weights and partially freezing modules while training and extended them from only task_heads to also gnn and graph_output_nn
Added preliminary unit test the discussed “minimal” finetuning pipeline (finetune from one of the task heads)
- We currently test depths, in_dim, ou_dim of changed modules are correct and overwritten correctly
- Test does not cover training/freezing yet
- Planning to add tests for finetuning from other modules as well
Configuration has been integrated with hydra but needs more work to be easier to use (@luis-mueller is helping with that)

Remarks:

Hidden_dims can only be used as int together with depth at the moment
Currently, everything has only been tested on cpu/gpu

WenkelF · 2023-08-04T02:27:07Z

@DomInvivo this pull introduces the following:

Finetuning pipeline
Updates to hydra configuration (by @cwognum )

The finetuning pipeline is maintained separately from existing architectures under graphium/finetuning/finetuning_architecture. The FullGraphFinetuningNetwork comes with two submodules:

PretrainedModel
FinetuningHead (optional)

PretrainedModel can load pretrained models (e.g., FullGraphMultitaskNetwork) and allows basic finetuning operations (in particular from the task heads), including dropping or extending layers of the module to finetune from.

FinetuningHead is optional and allows to define fully customizable networks that are applied on top of PretrainedModel. If not specified, we fall back to basic finetuning explained above.

Training is handled by the GraphFinetuning callback in graphium/finetuning/finetuning.

All methods in graphium/finetuning are implemented such that they are not specific to a pretrained model or finetuning head. This is achieved by requiring the pretrained model to come with a module_map (see, e.g., FullGraphMultitaskNetwork) that facilitates setting pretrained weights and freezing the correct layers.

The new unit test tests/test_finetuning tests the pipeline end-to-end for a specific case. Together with the corresponding config, it might be a good starting point to understand the pipeling and what information is needed in the config.

The updates to hydra allow to easily switch between benchmarking and finetuning. Major changes are documented here WenkelF#10

DomInvivo

Mostly looks good, great work on this major PR!
A few changes to make, and some comments.

expts/hydra-configs/finetuning/finetuning.yaml

env.yml

expts/hydra-configs/finetuning/admet.yaml

expts/hydra-configs/tasks/admet.yaml

graphium/cli/finetune.py

DomInvivo · 2023-08-05T03:00:25Z

graphium/trainer/predictor.py

+        if "task_heads_kwargs" in model_kwargs.keys():
+            task_heads_kwargs = model_kwargs["task_heads_kwargs"]
+        elif "pretrained_model_kwargs" in model_kwargs.keys():
+            # This covers finetuning cases where we finetune from the task_heads
+            task_heads_kwargs = model_kwargs["pretrained_model_kwargs"]["task_heads_kwargs"]
+        else:
+            raise ValueError("incorrect model_kwargs")


I don't think this should be here. I think that, if you are using a pre-trained model, you should pass directly model_kwargs["pretrained_model_kwargs"] into model_kwargs.

Removed in a3d4715 as explained below

DomInvivo · 2023-08-05T03:01:56Z

graphium/trainer/predictor.py

+                task_level=task_heads_kwargs[key]["task_level"],
+                task=key
+                # task_level=model_kwargs["task_heads_kwargs"][key]["task_level"], task=key


In general, the PredictorModule should be agnostic to the model passed. By having the self._get_task_key here, it forces a certain architecture in the config which is not very flexible.

I see that this logic was introduced already in the code prior to this PR. If it requires too many changes, let's open a new issue.

You are right, thanks for pointing out.

We could achieve this by getting the task-specific information (which is only the task level as far as I know) from the datamodule.

Here is how this could be done:
a3d4715

What do you think?

graphium/trainer/predictor.py

DomInvivo · 2023-08-05T03:04:43Z

graphium/trainer/predictor.py

+                task_level=task_heads_kwargs[key]["task_level"],
+                task=key


Again, don't like how the config structure is imposed. Perhaps task_level should simply be passed to the PredictorModule to keep flexibility

Good idea, this is implemented in a3d4715

DomInvivo · 2023-08-05T03:09:37Z

tests/dummy-pretrained-model-gpu.ckpt

Why duplicating the model for CPU and GPU? Models should be agnostic to the training hardware, and to the fine-tuning hardware. You can train on CPU and fine-tune on GPU or IPU

Yes, I agree. I only changed from gpu to cpu because github cannot do unit tests on gpu. Should I remove the gpu model?

Yes, you can remove the gpu model.

DomInvivo · 2023-08-05T03:34:01Z

@zhiyil-graphcore @s-maddrellmander We'll need your help here to fix the tests for IPU. And ideally, have a test that loads a CPU-trained model onto IPU for finetuning.

… when finetuning

Additional documentation pass and addressing comments from the PR review

WenkelF · 2023-08-08T16:13:47Z

@cwognum the bug in the finetuning training is fixed here febdf2d

I missed a deepcopy operation when defining the datahashes for the TDC datasets. We include the first 5 rows of the df when generating the hash and the bug reduced the datasets to those 5 rows (molecules) as well.

Make sure to remove the TDC datasets from datacache. You can use datamodule._path_to_load_from_file("train) to get "train_[data_hash]" and then remove all data hashes (e.g., "train_[data_hash]", "test_[data_hash]", etc. that end with the same sequence.

WenkelF · 2023-08-09T00:34:06Z

@DomInvivo I added some final improvements 24354ee

Made modify_cfg_for_finetuning function fully agnostic to pretrained model
Dropped dummy-pretrained-model-gpu and added map_location as an argument to load_pretrained_models function
Added the option to keep modules after finetuning module

(3.) makes it much easier to finetune from modules other than the task_heads without manually re-defining the downstream network. When finetuning from task_heads, it is not needed.

Merging the IPU tests so the ipu CLI test is in the correct environment for major updates to workflow in #414

s-maddrellmander · 2023-08-09T11:10:00Z

@WenkelF - try merging from the main branch, I've made a change to the IPU test CLI that should take into account the changes made in this PR. If that doesn't work let me know.

WenkelF and others added 3 commits July 21, 2023 15:30

Adding finetuning pipeline (v1)

de1c1d7

Adding finetuning pipeline (v2)

47d2936

Merge branch 'main' into explore_finetuning

4fe9e63

DomInvivo reviewed Jul 23, 2023

View reviewed changes

DomInvivo requested a review from michalkoziarski July 23, 2023 18:56

DomInvivo assigned engmubarak48 Jul 23, 2023

DomInvivo requested a review from engmubarak48 July 23, 2023 18:56

DomInvivo assigned michalkoziarski, engmubarak48 and WenkelF and unassigned engmubarak48 Jul 23, 2023

DomInvivo removed request for michalkoziarski and engmubarak48 July 23, 2023 18:56

WenkelF and others added 5 commits July 26, 2023 05:03

Merge branch 'datamol-io:main' into explore_finetuning

6fbafad

Removing redundant files

5dbcf2a

Removing redundant files

89f765b

Integrating finetuning with hydra

815ae34

Merge pull request #1 from WenkelF/explore_finetuning

fa53287

Explore finetuning

cwognum reviewed Jul 28, 2023

View reviewed changes

WenkelF and others added 7 commits July 28, 2023 12:22

Implementing separate class FullFinetuningNetwork

bc8b8c8

Moving parameter overwriting logic from FullGraphMultitaskNetwork to …

4b9e6ec

…FullGraphFinetuningNetwork

Extending modify_cfg_for_finetuning function also to gnn and graph_ou…

243d9c8

…tput_nn

Fixing (un)freezing logic and extending to finetuning from graph_outp…

d9f9d47

…ut_nn and gnn; addressing comments

Adding preliminary unit test for finetuning

65811b8

Adding preliminary unit test for finetuning

064c610

Reformatting with black

c6836c0

DomInvivo reviewed Jul 29, 2023

View reviewed changes

luis-mueller mentioned this pull request Jul 30, 2023

Configs v2 #421

Merged

5 tasks

WenkelF linked an issue Aug 4, 2023 that may be closed by this pull request

Removing epoch_sampling_fraction does not support DatasetProcessingParams #422

Closed

cwognum removed a link to an issue Aug 4, 2023

Removing epoch_sampling_fraction does not support DatasetProcessingParams #422

Closed

This was linked to issues Aug 4, 2023

Optimizing main_run_multitask #425

Closed

Removing epoch_sampling_fraction does not support DatasetProcessingParams #422

Closed

WenkelF added 2 commits August 4, 2023 13:15

Minor changes and updating doc strings

6463e48

Fixing doc string

da0d058

DomInvivo mentioned this pull request Aug 5, 2023

Replace assert statements by if / raise #428

Open

DomInvivo requested changes Aug 5, 2023

View reviewed changes

WenkelF and others added 8 commits August 7, 2023 11:18

Making predictor model-unspecific

a3d4715

Reformatting with black

529362a

Saving featurization to predictor checkpoint and load from pretrained…

98db36c

… when finetuning

Documentation pass and address PR review

cbf2ad7

Merge pull request #12 from WenkelF/finetuning_docs

38748f9

Additional documentation pass and addressing comments from the PR review

Updated deprecation warning

3e393d8

Minor change to __main__.py

119a862

Fixing bug in finetuing training

febdf2d

WenkelF added 2 commits August 8, 2023 20:23

Finishing touches

24354ee

Reformatting

b97289a

s-maddrellmander added 2 commits August 9, 2023 10:45

samuelm - Testing change RE CLI for IPU

014b6a3

Revert

7d2eb5c

s-maddrellmander added a commit that referenced this pull request Aug 9, 2023

Update test_ipu.yml

9515ad6

Merging the IPU tests so the ipu CLI test is in the correct environment for major updates to workflow in #414

Merge remote-tracking branch 'origin/main' into explore_finetuning

6625e0c

DomInvivo approved these changes Aug 9, 2023

View reviewed changes

DomInvivo merged commit 617356d into datamol-io:main Aug 9, 2023

DomInvivo mentioned this pull request Aug 23, 2023

Multi-level fine-tuning and "fingerprinting" #347

Closed

		from copy import deepcopy


		def modify_cfg_for_finetuning(cfg):



		class GraphFinetuning(BaseFinetuning):
		def __init__(self, cfg, train_bn: bool = False):

	def __init__(self, cfg, train_bn: bool = False):
	def __init__(self, fine-tuning, architecture, module_from_pretrained, ....................., train_bn: bool = False):

Finetuning pipeline #414

Finetuning pipeline #414

Uh oh!

Conversation

WenkelF commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DomInvivo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WenkelF Aug 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cwognum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DomInvivo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WenkelF Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WenkelF commented Jul 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WenkelF commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DomInvivo left a comment

Choose a reason for hiding this comment

Uh oh!

WenkelF commented Jul 21, 2023 •

edited

Loading

codecov bot commented Jul 23, 2023 •

edited

Loading

WenkelF Aug 1, 2023 •

edited

Loading

WenkelF Jul 31, 2023 •

edited

Loading

WenkelF commented Jul 29, 2023 •

edited

Loading

WenkelF commented Aug 4, 2023 •

edited

Loading

WenkelF commented Aug 9, 2023 •

edited

Loading