Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save models and RandomForestRegressor #349

Merged
merged 51 commits into from
Aug 9, 2021
Merged

Save models and RandomForestRegressor #349

merged 51 commits into from
Aug 9, 2021

Conversation

gcasadesus
Copy link
Contributor

@gcasadesus gcasadesus commented Jul 19, 2021

Description

  1. Provides methods regarding the saving and loading of models using JSON and CBOR format.

    • save_model: Saves a serialized object to disk. This function uses the json and cbor2 utilities for serialization. Models, ds-arrays, and dictionaries of all kinds of objects can be saved using this function.
    • load_model: Uses a custom decoder to deserialize saved object files to memory.
  2. Implemented RandomForestRegressor by modifying the already existing RandomForestClassifier. Mainly changed

    • the information of the leaf nodes from class frequency and mode to the mean value of the samples and
    • the splitting criterion from Gini Impurity (misclassification metric) to Mean Square Error (variance reduction metric).
  3. Creation of a commons module that includes code used in multiple tasks to improve code maintainability. Following the already existing naming conventions, the rf submodule has been created with the classes regarding Random Forests and Decision Trees. The Random Forest Classifier and Regressor classes are exposed in the modules classification and regression, respectively, to allow backward compatibility.

Fixes #306

Type of change

  • New algorithm or support class.
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

To ensure correct functionality the saved and loaded models have been tested with the test cases already implemented in the library.

  • I have added new test files.
  • I have added new test cases.
  • I have tested it manually in a local environment.
  • I have tested it manually in a supercomputer.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas.
  • I have documented the public methods of any public class according to the guide styles.
  • I have made corresponding changes to the documentation
  • New and existing unit tests pass locally with my changes
  • I have rebased my branch before trying to merge.

fjconejero
fjconejero previously approved these changes Jul 19, 2021
@codecov
Copy link

codecov bot commented Jul 26, 2021

Codecov Report

Merging #349 (4a44cd3) into master (66e98b4) will increase coverage by 1.81%.
The diff coverage is 98.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   95.95%   97.77%   +1.81%     
==========================================
  Files          37       38       +1     
  Lines        3759     4001     +242     
==========================================
+ Hits         3607     3912     +305     
+ Misses        152       89      -63     
Impacted Files Coverage Δ
dislib/classification/csvm/base.py 95.49% <ø> (ø)
dislib/commons/rf/forest.py 97.71% <97.71%> (ø)
dislib/commons/rf/decision_tree.py 99.01% <97.82%> (ø)
dislib/utils/saving.py 98.61% <98.61%> (ø)
dislib/commons/rf/data.py 98.76% <98.76%> (ø)
dislib/classification/__init__.py 100.00% <100.00%> (ø)
dislib/commons/rf/test_split.py 100.00% <100.00%> (ø)
dislib/regression/__init__.py 100.00% <100.00%> (ø)
dislib/utils/__init__.py 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66e98b4...4a44cd3. Read the comment docs.

@gcasadesus gcasadesus changed the title Save models Save models and RF Regressor Jul 27, 2021
@gcasadesus gcasadesus changed the title Save models and RF Regressor Save models and RandomForestRegressor Jul 27, 2021
@gcasadesus gcasadesus marked this pull request as ready for review July 29, 2021 07:21
fjconejero
fjconejero previously approved these changes Aug 2, 2021
@gcasadesus gcasadesus closed this Aug 9, 2021
@gcasadesus gcasadesus reopened this Aug 9, 2021
@fjconejero fjconejero merged commit 8b34f9c into bsc-wdc:master Aug 9, 2021
@gcasadesus gcasadesus deleted the save-models branch August 9, 2021 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save models to disk
2 participants