Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphium 3.0 #519

Draft
wants to merge 189 commits into
base: main
Choose a base branch
from
Draft

Graphium 3.0 #519

wants to merge 189 commits into from

Conversation

DomInvivo
Copy link
Collaborator

@DomInvivo DomInvivo commented Jul 15, 2024

Changelogs

Moving to Graphium 3.0! This will be a large PR that basically regroups many other PRs there. So for specific changes, please consult the original PRs.


discussion related to that PR

Todo's before merging

Before merging Graphium 3.0, we need to do the following tests.

Validating the C++

Assigned to @WenkelF and @AnujaSomthankar, with @ndickson-nvidia to help fix issues if any

The C++ changes were brought by the PR #510 , and there were a lot of unit-tests and validation using the ToyMix dataset. What's left to be done is validating that we can reproduce the experimental results of pre-training a MolGPS model.

  • Train a 10M model on LargeMix and validate the pre-train and finetune performance match the graphium 2.X
  • Validate that training is faster Training is not faster on our cluster, but expected to be faster where disk reading is limiting the speed.
  • Validate that we can run inference on a new dataset without caching (since caching is only for labels)
  • Train a 1B model and make sure that we match the metrics
  • Validate that the finetuning performance is consistent
  • Make sure the documentation in the Readme.md contains all info for installing the C++ libraries
  • Clearly documenting the inputs / outputs of every C++ function @ndickson-nvidia (see PR Reorder atoms in label data (Fixes 502) + documenting C++ #521 )

Validating the torchmetrics

Assigned to @WenkelF and @AnujaSomthankar, with @DomInvivo to help fix issues if any

  • First test the torchmetrics PR Torchmetrics usage improvements with classes instead of functionals #517 independantly
  • Train a 10M model on LargeMix and validate that the metrics are the same as graphium 2.0, and that it is same speed and RAM is lower (during validation and testing)
  • Validate that the mean_pred, mean_target, grad_norm, train_loss, train_loss_{task}, and other metrics all get logged properly to Wandb
  • Then merge with the C++ changes on the graphium 3.0 branch, and test again

Improving the cuda support

Assigned to @WenkelF and @AnujaSomthankar

Fixing the node ordering issue

Assigned to @ndickson-nvidia , see PR #521

Support for Mixed precision

Supporting mixed precision should be easy with lightning. However, we face issues that some of the tasks are very sparse and require float32. What we suggest is to have a custom mixed-precision that doesn't affect the task heads, but only the body of the GNN network.

  • [ ] Implement custom mixed precision.

Removing the IPU support #525

Assigned to @DomInvivo

Since Graphcore is no longer maintaining IPU support in lightning, it is best to remove it from Graphium 3.0. It will stay compatible with 2.0, and can be brought back if necessary, afterwards. (We got the approval from GraphCore for this)

  • Remove custom IPU functions
  • Remove Lightning wrappers for IPUs
  • Remove actions and unit-tests for IPUs

Command line

Assigned to @WenkelF

  • Some command line improvements for training and finetuning
  • Improving documentation

Packaging

Assigned to @Andrewq11

  • Make sure the documentation, both readme and docs, are aligned with the latest changes
  • Make sure that the package can be installed via conda, and that C++ dependencies resolve automatically
  • [ ] Make sure that the package can be installed via pip the pip package doesn't have headers, so we can't release there.
  • Make sure that we install the minimal amount of GCC compilers needed for the code to work
  • Make sure that we don't need to install graphium and graphium_cpp as 2 different packages
  • Build the documentation for the C++ part of the code so that it appears in the docs. ChatGPT says we can with the doxygen package
  • [ ] Support numpy >= 2.0 Need to wait for Pyg to support numpy>=2.0

For now, we are constrained by the rdkit version == 2024.03.4 due to missing headers in more recent releases. And also can only support conda due to missing headers for pip.

Polaris

  • [ ] Add data download from Polaris Not for initial release

Linting

  • Run black linting on the code. Wait for last-minute to avoid cluttering the PR.

2024-11-14 update of what's missing

  • Header comments for the C++ files @ndickson-nvidia
  • Looking at the rdkit pinning issue to see if we can support more versions, and open an issue -> pinning removed!

ndickson-nvidia and others added 30 commits April 12, 2024 20:36
…ented in C++ for featurization and preprocessing optimizations, along with a few other optimizations, significantly reducing memory usage, disk usage, and processing time for large datasets.
…kDataset is still used in test_dataset.py, but won't after it's changed in a later commit.
…ng from the Python featurization yet), removed option to featurize to GraphData class instead of PyG Data class, added deprecation warnings to datamodule.py for parameters that are now unused, some cleanup in MultitaskFromSmilesDataModule::__init__, changed tensor index variables to properties, added preprocessing_n_jobs (not yet used), etc.
… symmetric diagonalization, avoiding the need to handle complex eigenvectors and eigenvalues
… match behaviour from get_simple_mol_conformer Python code, but adding Hs, as recommended for conformer generation.
…catenate_strings function, though it's not used yet.
…changed create_all_features to create all tensors even if there are nans, so that the number of atoms can still be determined from the shape of the atom features tensor. Changed parse_mol to default to not reordering atoms, to match test order.
…e.py, keeping the notes in the function comments
@Andrewq11 Andrewq11 mentioned this pull request Nov 6, 2024
5 tasks
@Andrewq11
Copy link
Collaborator

PR for packaging tasks here: #531

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants