Skip to content

Commit

Permalink
Merge branch 'enhancement/tutorial_serialization' into 'dev'
Browse files Browse the repository at this point in the history
Enhancement/tutorial serialization

See merge request cdd/QSPRpred!190
  • Loading branch information
HellevdM committed Aug 22, 2024
2 parents e3bab4e + 6271630 commit 4022298
Show file tree
Hide file tree
Showing 4 changed files with 1,048 additions and 37 deletions.
38 changes: 3 additions & 35 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,17 @@
# Change Log

From v3.0.2 to v3.1.0
From v3.1.1 to v3.1.2

## Fixes

- Fixed a bug in `QSPRDataset` where property transformations were not applied.
- Fixed a bug where an attached standardizer would be refit when calling
`QSPRModel.predictMols` with `use_applicability_domain=True`.
- Fixed random seed not set in `FoldsFromDataSplit.iterFolds` for `ClusterSplit`.

## Changes

- renamed `PandasDataTable.transform` to `PandasDataTable.transformProperties`
- moved `imputeProperties`, `dropEmptyProperties` and `hasProperty` from `MoleculeTable`
to `PandasDataTable`.
- removed `getProperties`, `addProperty`, `removeProperty`, now use `PandasDataTable`
methods directly.
- Since the way descriptors are saved has changed, this release is incompatible with
previous data sets and models. However, these can be easily converted to the new
format by adding
a prefix with descriptor set name to the old descriptor tables. Feel free to contact
us if you require assistance with this.
- Due to some changes in `rdkit-2023.9.6`, the `add_rdkit`
option for molecule tables temporarily might not work.
This also affects the current ChemProp integration, which was not adapted to 2.0.0 yet.
In order to prevent these issues, QSPRpred now forces rdkit version `rdkit-2023.9.5`,
but we will be working on resolving these.

## New Features

- Descriptors are now saved with prefixes to indicate the descriptor sets. This reduces
the chance of name collisions when using multiple descriptor sets.
- Added new methods to `MoleculeTable` and `QSARDataset` for more fine-grained control
of clearing, dropping and restoring of descriptor sets calculated for the dataset.
- `dropDescriptorSets` will drop descriptors associated with the given descriptor
sets.
- `dropDescriptors` will drop individual descriptors associated with the given
descriptor sets and properties.
- All drop actions are restorable with `restoreDescriptorSets` unless explicitly
cleared from the data set with the `clear` parameter of `dropDescriptorSets`.
- Added a proper API for parallelization backend selection and configuration (see
documentation of `ParallelGenerator` and `JITParallelGenerator` for more information).
- Clusters can now be added to a `MoleculeTable` with `addClusters` and retrieved with
`getClusters`, similar to scaffolds.
- added a tutorial on model and data serialization

## Removed Features

- removed support for PyBoost since the project was abandoned by the original developers and is [no longer maintained](https://github.com/sb-ai-lab/Py-Boost/graphs/contributors)

6 changes: 4 additions & 2 deletions tutorials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,11 @@ the [documentation pages](https://cddleiden.github.io/QSPRpred/docs/).
- [Logging](basics/modelling/logging.ipynb): How to set-up logging.
- [Model Assessment](basics/modelling/model_assessment.ipynb): How to assess the
performance of a model.
- Benchmarking
- [Benchmarking](basics/benchmarking/benchmarking.ipynb): How to benchmark
- Other
- [Benchmarking](basics/other/benchmarking.ipynb): How to benchmark
QSPRpred.
- [Serialization](basics/other/serialization.ipynb): How to save and
load datasets and models.
- **Advanced**
- Data
- [Parallelization](advanced/data/parallelization.ipynb): How to parallelize
Expand Down
File renamed without changes.
Loading

0 comments on commit 4022298

Please sign in to comment.