Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added and integrated C++ graphium_cpp library, a Python module implem… #510

Merged
merged 54 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
5ffe261
Added and integrated C++ graphium_cpp library, a Python module implem…
ndickson-nvidia Apr 13, 2024
8286383
Small changes to support not needing label data during data loading
ndickson-nvidia Apr 17, 2024
dca9b2b
Removed FakeDataset, FakeDataModule, and SingleTaskDataset. SingleTa…
ndickson-nvidia Apr 17, 2024
8304210
Removed option to featurize using Python, (but didn't delete everythi…
ndickson-nvidia Apr 17, 2024
4ee35d4
Removed newly deprecated options from yaml files
ndickson-nvidia Apr 18, 2024
cf23e37
Added support for limiting the number of threads used by prepare_and_…
ndickson-nvidia Apr 18, 2024
5db0e2a
Fixed compiler warning about signed vs. unsigned comparison
ndickson-nvidia Apr 18, 2024
c75a452
Fixed Python syntax issues
ndickson-nvidia Apr 18, 2024
4aa1f85
Changed asymmetric inverse normalization type to be implemented using…
ndickson-nvidia Apr 18, 2024
c53451a
Fixed compile errors
ndickson-nvidia Apr 18, 2024
268e245
Some simplification in collate.py
ndickson-nvidia Apr 19, 2024
e032e8e
Deleting most of the Python featurization code
ndickson-nvidia Apr 19, 2024
bdefe89
Implemented conformer generation in get_conformer_features, trying to…
ndickson-nvidia Apr 23, 2024
5298444
Deleted deprecated properties.py
ndickson-nvidia Apr 23, 2024
c38aa06
Handle case of no label data in prepare_and_save_data. Also added con…
ndickson-nvidia Apr 25, 2024
86abf21
Changed prepare_data to support having no label data
ndickson-nvidia Apr 25, 2024
80276da
Updated license passed to setup call in setup.py
ndickson-nvidia May 2, 2024
9492e62
Changes to get test_dataset.py and test_multitask_datamodule.py passing
ndickson-nvidia May 6, 2024
d94097c
Removed load_type option from test_training.py, because it's no longe…
ndickson-nvidia May 6, 2024
11e6935
Updated comment in setup.py about how to build graphium_cpp package
ndickson-nvidia May 14, 2024
ff93c2d
Rewrote test_featurizer.py. Fixed bug in mask_nans C++ function, and …
ndickson-nvidia May 14, 2024
a892068
Removed deprecation warnings and deprecated parameters from datamodul…
ndickson-nvidia May 23, 2024
38a5510
Recommended tweaks to extract_labels in multilevel_utils.py
ndickson-nvidia May 23, 2024
f7771b3
Fixed "else if"->"elif"
ndickson-nvidia May 23, 2024
4256839
Rewrote test_pe_nodepair.py to use graphium_cpp
ndickson-nvidia May 24, 2024
91c37a3
Rewrote test_pe_rw.py to use graphium_cpp. Comment update in test_pe_…
ndickson-nvidia May 24, 2024
f347a0d
Rewrote test_pe_spectral.py to use graphium_cpp
ndickson-nvidia May 24, 2024
26b5531
Removed tests/test_positional_encodings.py, because it's a duplicate …
ndickson-nvidia May 24, 2024
1ded38b
Fixed handling of disconnected components vs. single component for la…
ndickson-nvidia May 28, 2024
314d636
Fixed compile warnings in one_hot.cpp
ndickson-nvidia May 28, 2024
e49b4da
Rewrote test_positional_encoders.py, though it's still failing the te…
ndickson-nvidia May 28, 2024
f001464
Removed commented out lines from setup.py
ndickson-nvidia Jun 4, 2024
2782fbc
Ran linting on Python files
ndickson-nvidia Jun 4, 2024
77d27b5
Hopefully explicitly installing graphium_cpp fixes the automated test…
ndickson-nvidia Jun 5, 2024
cb1df19
Test fix
ndickson-nvidia Jun 5, 2024
f3f6a0d
Another test fix
ndickson-nvidia Jun 5, 2024
c5c0085
Another test fix
ndickson-nvidia Jun 5, 2024
6dd827f
Make sure RDKit can find Boost headers
ndickson-nvidia Jun 5, 2024
59c84a2
Reimplemented test_pos_transfer_funcs.py to test all supported conver…
ndickson-nvidia Jun 12, 2024
7bc8ade
Linting fixes
ndickson-nvidia Jun 12, 2024
6903243
Fixed collections.abs.Callable to typing.Callable for type hint
ndickson-nvidia Jun 12, 2024
9f38afb
Removed file_opener and its test
ndickson-nvidia Jun 17, 2024
5ab9ca9
Fixed the issue with boolean masking, introduced by `F._canonical_mas…
DomInvivo Jul 9, 2024
9c7504f
Fixed the float vs double issue in laplacian pos encoding
DomInvivo Jul 9, 2024
f8358f3
Added comment
DomInvivo Jul 9, 2024
692decc
Fixed the ipu tests by making sure that `IPUStrategy` is not imported…
DomInvivo Jul 9, 2024
8891e66
Update test.yml to only test python 3.10
DomInvivo Jul 9, 2024
c2d3c87
Removed positional encodings from the docs
DomInvivo Jul 9, 2024
d3d19d7
Merge remote-tracking branch 'origin/dom_unittest' into dom_unittest
DomInvivo Jul 9, 2024
0a1696f
Upgraded python versions in the tests
DomInvivo Jul 9, 2024
50265df
Removed reference to old files now in C++
DomInvivo Jul 9, 2024
58fc2aa
Downgraded python version
DomInvivo Jul 9, 2024
5852467
Fixed other docs broken references
DomInvivo Jul 9, 2024
ea9a775
Merge pull request #1 from ndickson-nvidia/dom_unittest
ndickson-nvidia Jul 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.09", "3.10", "3.11"]
pytorch-version: ["2.0"]

runs-on: "ubuntu-latest"
Expand Down Expand Up @@ -49,6 +49,9 @@ jobs:
- name: Install library
run: python -m pip install --no-deps -e . # `-e` required for correct `coverage` run.

- name: Install C++ library
run: cd graphium/graphium_cpp && git clone https://github.com/pybind/pybind11.git && export PYTHONPATH=$PYTHONPATH:./pybind11 && python -m pip install . && cd ../..

- name: Run tests
run: pytest -m 'not ipu'

Expand Down
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@
Copyright 2023 Valence Labs
Copyright 2023 Recursion Pharmaceuticals
Copyright 2023 Graphcore Limited
Copyright 2024 NVIDIA CORPORATION & AFFILIATES

Various Academic groups have also contributed to this software under
the given license. These include, but are not limited, to the following
Expand Down
29 changes: 0 additions & 29 deletions docs/api/graphium.features.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,37 +5,8 @@ Feature extraction and manipulation
=== "Contents"

* [Featurizer](#featurizer)
* [Positional Encoding](#positional-encoding)
* [Properties](#properties)
* [Spectral PE](#spectral-pe)
* [Random Walk PE](#random-walk-pe)
* [NMP](#nmp)

## Featurizer
------------
::: graphium.features.featurizer


## Positional Encoding
------------
::: graphium.features.positional_encoding


## Properties
------------
::: graphium.features.properties


## Spectral PE
------------
::: graphium.features.spectral


## Random Walk PE
------------
::: graphium.features.rw


## NMP
------------
::: graphium.features.nmp
4 changes: 0 additions & 4 deletions docs/api/graphium.utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,6 @@ module for utility functions
::: graphium.utils.mup


## Read File
----------------
::: graphium.utils.read_file

## Safe Run
----------------
::: graphium.utils.safe_run
Expand Down
3 changes: 2 additions & 1 deletion env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ dependencies:
- gcsfs >=2021.6

# ML packages
- cuda-version # works also with CPU-only system.
- cuda-version == 11.2 # works also with CPU-only system.
DomInvivo marked this conversation as resolved.
Show resolved Hide resolved
- pytorch >=1.12
- lightning >=2.0
- torchmetrics >=0.7.0,<0.11
Expand All @@ -43,6 +43,7 @@ dependencies:
# chemistry
- rdkit
- datamol >=0.10
- boost # needed by rdkit

# Optional deps
- sympy
Expand Down
6 changes: 0 additions & 6 deletions expts/configs/config_gps_10M_pcqm4m.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ accelerator:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"
homolumo:
Expand All @@ -76,10 +75,6 @@ datamodule:
split_test: 0.1

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
# 'possible_number_radical_e', 'possible_is_aromatic', 'possible_is_in_ring',
Expand Down Expand Up @@ -115,7 +110,6 @@ datamodule:
num_workers: 0 # -1 to use all
persistent_workers: False # if use persistent worker at the start of each epoch.
# Using persistent_workers false might make the start of each epoch very long.
featurization_backend: "loky"


architecture:
Expand Down
6 changes: 0 additions & 6 deletions expts/configs/config_gps_10M_pcqm4m_mod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ constants:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"
homolumo:
Expand All @@ -25,10 +24,6 @@ datamodule:
split_test: 0.1

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
# 'possible_number_radical_e', 'possible_is_aromatic', 'possible_is_in_ring',
Expand Down Expand Up @@ -84,7 +79,6 @@ datamodule:
num_workers: 0 # -1 to use all
persistent_workers: False # if use persistent worker at the start of each epoch.
# Using persistent_workers false might make the start of each epoch very long.
featurization_backend: "loky"

# ipu_dataloader_training_opts:
# mode: async
Expand Down
7 changes: 0 additions & 7 deletions expts/configs/config_mpnn_10M_b3lyp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ accelerator:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"
betagap:
Expand Down Expand Up @@ -88,12 +87,7 @@ datamodule:
split_test: 0.1

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: "../datacache/b3lyp/"
dataloading_from: ram
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
# 'possible_number_radical_e', 'possible_is_aromatic', 'possible_is_in_ring',
Expand Down Expand Up @@ -127,7 +121,6 @@ datamodule:
num_workers: 0 # -1 to use all
persistent_workers: False # if use persistent worker at the start of each epoch.
# Using persistent_workers false might make the start of each epoch very long.
featurization_backend: "loky"


architecture:
Expand Down
7 changes: 0 additions & 7 deletions expts/configs/config_mpnn_pcqm4m.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ constants:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"
homolumo:
Expand All @@ -26,12 +25,7 @@ datamodule:
split_names: ["train", "valid", "test-dev"]

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 20
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: "graphium/data/PCQM4Mv2/"
dataloading_from: ram
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
# 'possible_number_radical_e', 'possible_is_aromatic', 'possible_is_in_ring',
Expand Down Expand Up @@ -61,7 +55,6 @@ datamodule:
num_workers: 40 # -1 to use all
persistent_workers: False # if use persistent worker at the start of each epoch.
# Using persistent_workers false might make the start of each epoch very long.
featurization_backend: "loky"

# ipu_dataloader_training_opts:
# mode: async
Expand Down
5 changes: 0 additions & 5 deletions expts/hydra-configs/architecture/largemix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,7 @@ architecture:
datamodule:
module_type: "MultitaskFromSmilesDataModule"
args:
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 20
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: ${constants.datacache_path}
dataloading_from: "disk"
num_workers: 20 # -1 to use all
persistent_workers: True
featurization:
Expand Down
5 changes: 0 additions & 5 deletions expts/hydra-configs/architecture/pcqm4m.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,8 @@ architecture:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: ${constants.datacache_path}
num_workers: 40 # -1 to use all
persistent_workers: False # if use persistent worker at the start of each epoch.
Expand Down
5 changes: 0 additions & 5 deletions expts/hydra-configs/architecture/toymix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,12 +74,7 @@ architecture:
datamodule:
module_type: "MultitaskFromSmilesDataModule"
args:
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: ${constants.datacache_path}
dataloading_from: ram
num_workers: 30 # -1 to use all
persistent_workers: False
featurization:
Expand Down
1 change: 0 additions & 1 deletion expts/hydra-configs/finetuning/admet_baseline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ constants:
datamodule:
args:
batch_size_training: 32
dataloading_from: ram
persistent_workers: true
num_workers: 4

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ metrics:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"
homolumo:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ datamodule:
args:
batch_size_training: 200
batch_size_inference: 200
featurization_n_jobs: 20
num_workers: 20

predictor:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ datamodule:
args:
batch_size_training: 2048
batch_size_inference: 2048
featurization_n_jobs: 6
num_workers: 6

predictor:
Expand Down
1 change: 0 additions & 1 deletion expts/hydra-configs/training/accelerator/toymix_cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ datamodule:
args:
batch_size_training: 200
batch_size_inference: 200
featurization_n_jobs: 4
num_workers: 4

predictor:
Expand Down
1 change: 0 additions & 1 deletion expts/hydra-configs/training/accelerator/toymix_gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ datamodule:
args:
batch_size_training: 200
batch_size_inference: 200
featurization_n_jobs: 4
num_workers: 4

predictor:
Expand Down
6 changes: 0 additions & 6 deletions expts/neurips2023_configs/base_config/large.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ accelerator:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"
l1000_vcap:
Expand Down Expand Up @@ -133,11 +132,6 @@ datamodule:
epoch_sampling_fraction: 1.0

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
dataloading_from: disk
processed_graph_data_path: ${constants.datacache_path}
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
Expand Down
6 changes: 0 additions & 6 deletions expts/neurips2023_configs/base_config/large_pcba.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ accelerator:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"

Expand Down Expand Up @@ -132,11 +131,6 @@ datamodule:
#epoch_sampling_fraction: 1.0

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
dataloading_from: disk
processed_graph_data_path: ${constants.datacache_path}
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
Expand Down
6 changes: 0 additions & 6 deletions expts/neurips2023_configs/base_config/large_pcqm_g25.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ accelerator:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"

Expand Down Expand Up @@ -132,11 +131,6 @@ datamodule:
# epoch_sampling_fraction: 1.0

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
dataloading_from: disk
processed_graph_data_path: ${constants.datacache_path}
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
Expand Down
6 changes: 0 additions & 6 deletions expts/neurips2023_configs/base_config/large_pcqm_n4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ accelerator:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"

Expand Down Expand Up @@ -132,11 +131,6 @@ datamodule:
epoch_sampling_fraction: 1.0

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
dataloading_from: disk
processed_graph_data_path: ${constants.datacache_path}
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
Expand Down
5 changes: 0 additions & 5 deletions expts/neurips2023_configs/base_config/small.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ accelerator:

datamodule:
module_type: "MultitaskFromSmilesDataModule"
# module_type: "FakeDataModule" # Option to use generated data
args: # Matches that in the test_multitask_datamodule.py case.
task_specific_args: # To be replaced by a new class "DatasetParams"
qm9:
Expand Down Expand Up @@ -97,10 +96,6 @@ datamodule:
method: "normal"

# Featurization
prepare_dict_or_graph: pyg:graph
featurization_n_jobs: 30
featurization_progress: True
featurization_backend: "loky"
processed_graph_data_path: "../datacache/neurips2023-small/"
featurization:
# OGB: ['atomic_num', 'degree', 'possible_formal_charge', 'possible_numH' (total-valence),
Expand Down
Loading
Loading