Graphium 3.0 #519

DomInvivo · 2024-07-15T14:35:17Z

Changelogs

Moving to Graphium 3.0! This will be a large PR that basically regroups many other PRs there. So for specific changes, please consult the original PRs.

See PR Added and integrated C++ graphium_cpp library, a Python module implem… #510 for the changes in C++
See PR Torchmetrics usage improvements with classes instead of functionals #517 for the changes related to torchmetrics
See PR Reorder atoms in label data (Fixes 502) + documenting C++ #521 for the changes to the node ordering
TODO: link to other PR's when available

discussion related to that PR

Todo's before merging

Before merging Graphium 3.0, we need to do the following tests.

Validating the C++

Assigned to @WenkelF and @AnujaSomthankar, with @ndickson-nvidia to help fix issues if any

The C++ changes were brought by the PR #510 , and there were a lot of unit-tests and validation using the ToyMix dataset. What's left to be done is validating that we can reproduce the experimental results of pre-training a MolGPS model.

Train a 10M model on LargeMix and validate the pre-train and finetune performance match the graphium 2.X
~~Validate that training is faster~~ Training is not faster on our cluster, but expected to be faster where disk reading is limiting the speed.
Validate that we can run inference on a new dataset without caching (since caching is only for labels)
Train a 1B model and make sure that we match the metrics
Validate that the finetuning performance is consistent
Make sure the documentation in the Readme.md contains all info for installing the C++ libraries
Clearly documenting the inputs / outputs of every C++ function @ndickson-nvidia (see PR Reorder atoms in label data (Fixes 502) + documenting C++ #521 )

Validating the torchmetrics

Assigned to @WenkelF and @AnujaSomthankar, with @DomInvivo to help fix issues if any

First test the torchmetrics PR Torchmetrics usage improvements with classes instead of functionals #517 independantly
Train a 10M model on LargeMix and validate that the metrics are the same as graphium 2.0, and that it is same speed and RAM is lower (during validation and testing)
Validate that the mean_pred, mean_target, grad_norm, train_loss, train_loss_{task}, and other metrics all get logged properly to Wandb
Then merge with the C++ changes on the graphium 3.0 branch, and test again

Improving the cuda support

Assigned to @WenkelF and @AnujaSomthankar

Validate the multi-gpu training of the 10M model with DDP 4 gpus, and that results are consistent with 1 gpu
Bump the version of cuda-version and remove restriction to 11.2. Close Remove forced constraint to cuda 11.2 with Graphium 3.0 #512

Fixing the node ordering issue

Assigned to @ndickson-nvidia , see PR #521

Resolve issue Re-order atoms and edges when matching molecules from different datasets #502 with PR Reorder atoms in label data (Fixes 502) + documenting C++ #521
Add unit-tests for the node ordering issue

Support for Mixed precision

Supporting mixed precision should be easy with lightning. However, we face issues that some of the tasks are very sparse and require float32. What we suggest is to have a custom mixed-precision that doesn't affect the task heads, but only the body of the GNN network.

~~[ ] Implement custom mixed precision.~~

Removing the IPU support #525

Assigned to @DomInvivo

Since Graphcore is no longer maintaining IPU support in lightning, it is best to remove it from Graphium 3.0. It will stay compatible with 2.0, and can be brought back if necessary, afterwards. (We got the approval from GraphCore for this)

Remove custom IPU functions
Remove Lightning wrappers for IPUs
Remove actions and unit-tests for IPUs

Command line

Assigned to @WenkelF

Some command line improvements for training and finetuning
Improving documentation

Packaging

Assigned to @Andrewq11

Make sure the documentation, both readme and docs, are aligned with the latest changes
Make sure that the package can be installed via conda, and that C++ dependencies resolve automatically
~~[ ] Make sure that the package can be installed via pip~~ the pip package doesn't have headers, so we can't release there.
Make sure that we install the minimal amount of GCC compilers needed for the code to work
Make sure that we don't need to install graphium and graphium_cpp as 2 different packages
Build the documentation for the C++ part of the code so that it appears in the docs. ChatGPT says we can with the doxygen package
~~[ ] Support numpy >= 2.0~~ Need to wait for Pyg to support numpy>=2.0

For now, we are constrained by the rdkit version == 2024.03.4 due to missing headers in more recent releases. And also can only support conda due to missing headers for pip.

Polaris

~~[ ] Add data download from Polaris~~ Not for initial release

Linting

Run black linting on the code. Wait for last-minute to avoid cluttering the PR.

2024-11-14 update of what's missing

Header comments for the C++ files @ndickson-nvidia
Looking at the rdkit pinning issue to see if we can support more versions, and open an issue -> pinning removed!

…ented in C++ for featurization and preprocessing optimizations, along with a few other optimizations, significantly reducing memory usage, disk usage, and processing time for large datasets.

…kDataset is still used in test_dataset.py, but won't after it's changed in a later commit.

…ng from the Python featurization yet), removed option to featurize to GraphData class instead of PyG Data class, added deprecation warnings to datamodule.py for parameters that are now unused, some cleanup in MultitaskFromSmilesDataModule::__init__, changed tensor index variables to properties, added preprocessing_n_jobs (not yet used), etc.

…save_data

… symmetric diagonalization, avoiding the need to handle complex eigenvectors and eigenvalues

… match behaviour from get_simple_mol_conformer Python code, but adding Hs, as recommended for conformer generation.

…catenate_strings function, though it's not used yet.

…le with `torchmetrics`

…ents on the `Predictor` for Todos

…r used

…changed create_all_features to create all tensors even if there are nans, so that the number of atoms can still be determined from the shape of the atom features tensor. Changed parse_mol to default to not reordering atoms, to match test order.

…e.py, keeping the notes in the function comments

…ture encoding

…m into atom_order

Reorder atoms in label data (Fixes 502) + documenting C++

Remove IPU (again)

Integrate finetuning & fingerprinting

Andrewq11 · 2024-11-06T22:42:01Z

PR for packaging tasks here: #531

Added C++ file description comments

Fix overzealous label data clipping in save_label_data

ndickson-nvidia and others added 30 commits April 12, 2024 20:36

Added and integrated C++ graphium_cpp library, a Python module implem…

5ffe261

…ented in C++ for featurization and preprocessing optimizations, along with a few other optimizations, significantly reducing memory usage, disk usage, and processing time for large datasets.

Small changes to support not needing label data during data loading

8286383

Removed FakeDataset, FakeDataModule, and SingleTaskDataset. SingleTas…

dca9b2b

…kDataset is still used in test_dataset.py, but won't after it's changed in a later commit.

Removed newly deprecated options from yaml files

4ee35d4

Added support for limiting the number of threads used by prepare_and_…

cf23e37

…save_data

Fixed compiler warning about signed vs. unsigned comparison

5db0e2a

Fixed Python syntax issues

c75a452

Changed asymmetric inverse normalization type to be implemented using…

4aa1f85

… symmetric diagonalization, avoiding the need to handle complex eigenvectors and eigenvalues

Fixed compile errors

c53451a

Some simplification in collate.py

268e245

Deleting most of the Python featurization code

e032e8e

Implemented conformer generation in get_conformer_features, trying to…

bdefe89

… match behaviour from get_simple_mol_conformer Python code, but adding Hs, as recommended for conformer generation.

Deleted deprecated properties.py

5298444

Handle case of no label data in prepare_and_save_data. Also added con…

c38aa06

…catenate_strings function, though it's not used yet.

Changed prepare_data to support having no label data

86abf21

Removed ipu metrics, since not compatible with latest torchmetrics

bd59098

Updated MetricWrapper to work with update and compute, compatib…

734ba55

…le with `torchmetrics`

Changed requirements for torchmetrics

b6c578f

fixed the loss by adding MetricToTorchMetrics, and added a few comm…

4f6e816

…ents on the `Predictor` for Todos

Updated license passed to setup call in setup.py

80276da

Major updates to predictor_summaries.py

7933ae5

Improved the predictor summaries. Added GradientNormMetric

5849927

Changes to get test_dataset.py and test_multitask_datamodule.py passing

9492e62

Removed load_type option from test_training.py, because it's no longe…

d94097c

…r used

Updated comment in setup.py about how to build graphium_cpp package

11e6935

Removed deprecation warnings and deprecated parameters from datamodul…

a892068

…e.py, keeping the notes in the function comments

Recommended tweaks to extract_labels in multilevel_utils.py

38a5510

Fixed "else if"->"elif"

f7771b3

ndickson-nvidia added 3 commits September 23, 2024 14:30

Added doxygen comments for functions and enums related to one-hot fea…

7fd40e7

…ture encoding

Added more doxygen comments

06b12b2

Added and updated more comments

5d798a5

This was referenced Sep 23, 2024

Remove forced constraint to cuda 11.2 with Graphium 3.0 #512

Open

Properly testing the node ordering in test_featurizer.py #504

Open

ndickson-nvidia and others added 19 commits September 23, 2024 16:33

Added comments to each function in features.cpp

7261279

Investigating failing unit tests

b82f582

Added more comments to labels.cpp

4a152b2

Merge branch 'atom_order' of ssh://github.com/ndickson-nvidia/graphiu…

618bbb1

…m into atom_order

Build fix in features.cpp

92ab751

Skipping test_training.py for now

f123565

Merge pull request #521 from ndickson-nvidia/atom_order

e887176

Reorder atoms in label data (Fixes 502) + documenting C++

Updated documentation

1c4aa3b

Updated documentation

75df01a

Wrapping up finetuning updates

8436111

Update readme

9cfabb7

Fixed finetuning unit test

bf504a8

Fixing docs

42c9dbc

Merge pull request #530 from datamol-io/remove_ipu

cbf9dc7

Remove IPU (again)

Reducing size of example finetuning dataset

807913f

Cleaning up configs and naming conventions

697e3d1

Minor changes and documentation

d9aa407

Minor change

7759266

Merge pull request #529 from datamol-io/upgrade-finetuning

7cfa89a

Integrate finetuning & fingerprinting

Andrewq11 mentioned this pull request Nov 6, 2024

Packaging of Graphium 3.0 #531

Open

5 tasks

ndickson-nvidia and others added 4 commits November 14, 2024 18:30

Added C++ file description comments

8a65d84

Merge pull request #532 from ndickson-nvidia/cpp_file_descriptions

5ffc2f6

Added C++ file description comments

Fixed incorrect clipping of label data

348403b

Merge pull request #534 from datamol-io/fix_clipping

ee601ea

Fix overzealous label data clipping in save_label_data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graphium 3.0 #519

Graphium 3.0 #519

Uh oh!

DomInvivo commented Jul 15, 2024 •

edited

Loading

Uh oh!

Andrewq11 commented Nov 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Graphium 3.0 #519

Are you sure you want to change the base?

Graphium 3.0 #519

Uh oh!

Conversation

DomInvivo commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelogs

Todo's before merging

Validating the C++

Validating the torchmetrics

Improving the cuda support

Fixing the node ordering issue

Support for Mixed precision

Removing the IPU support #525

Command line

Packaging

Polaris

Linting

2024-11-14 update of what's missing

Uh oh!

Andrewq11 commented Nov 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

DomInvivo commented Jul 15, 2024 •

edited

Loading