Skip to content

Conversation

@roshankern
Copy link
Member

This PR is ready for review!

This is the first of many PRs to restructure this repo to use CP/merged features in addition to DP features.

Currently, the download module combines two mitocheck datasets (2006 and 2015). However, after consideration, Greg and I decided that only the newer dataset is needed (at least for now). Thus, the biggest change in this PR is simply downloading the latest dataset instead of downloading the later and early datasets and then merging the two.

The second very small change that this PR implements is changing the name of the downloaded data to labeled_data instead of training_data. This is to make it clear that the downloaded data is not the training dataset (downloaded data gets split into training, testing, and maybe holdout in a future revision of this repo).

@roshankern roshankern requested a review from axiomcura March 9, 2023 22:51
Copy link
Member

@axiomcura axiomcura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice and short!

roshankern and others added 2 commits March 9, 2023 18:13
Co-authored-by: Erik Serrano <31600622+axiomcura@users.noreply.github.com>
@roshankern roshankern merged commit f289bcb into WayScience:cp-feature-refactor Mar 9, 2023
@roshankern roshankern deleted the refactor-download-module branch March 9, 2023 23:16
roshankern added a commit that referenced this pull request Jul 6, 2023
* Refactor Download Module (#18)

* refactor module

* remove training data file

* Update 0.download_data/scripts/nbconverted/download_data.py

Co-authored-by: Erik Serrano <31600622+axiomcura@users.noreply.github.com>

* eric suggestions

---------

Co-authored-by: Erik Serrano <31600622+axiomcura@users.noreply.github.com>

* Refactor Split Data Module (#19)

* refactor module

* greg suggestions

* Train module refactor (#20)

* refactor format module

* use straify function

* rerun train module

* black formatting

* docs, nbconvert

* nbconvert

* rerun pipeline, rename model

* fix typo

* Update 2.train_model/README.md

Co-authored-by: Gregory Way <gregory.way@gmail.com>

* Update 2.train_model/README.md

Co-authored-by: Gregory Way <gregory.way@gmail.com>

* Update 2.train_model/README.md

Co-authored-by: Gregory Way <gregory.way@gmail.com>

* notebook run

---------

Co-authored-by: Gregory Way <gregory.way@gmail.com>

* Refactor evaluate module (#21)

* refactor clas pr curves

* refactor confusion matrix

* refactor F1 scores

* refactor model predictions

* documentation

* dave suggestions

* erik suggestions, reconvert

* Refactor interpret module (#22)

* refactor interpret notebook

* docs, reconvert script

* greg suggestions

* Get Leave One Image Out Probabilities (#23)

* add LOIO notebook

* LOIO notebook

* update notebook

* download and split data with cell UUIDs

* move LOIO

* finish LOIO

* black formatting

* rerun notebook

* rerun notebook, dave suggestions

* greg comment

* Train single class models (#25)

* move multiclass models

* rename files, fix sh

* single class models notebook

* run notebook

* binarize labels

* train single class models

* reconvert notebooks

* update readme

* rename sh file

* remove models

* eric readme suggestions

* rerun notebook, eric suggestions

* Add Single Class Model PR Curves (#26)

* get SCM PR curves

* shuffled baseline

* retrain single class models with correct kernel

* rerun pr curves notebook

* remove nones

* rerun multiclass model

* rerun notebook

* move file

* docs, black formatting

* format notebook

* Update 3.evaluate_model/README.md

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* dave suggestions

* reconvert notebook

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Add SCM confusion matrices and F1 scores (#27)

* get SCM PR curves

* shuffled baseline

* retrain single class models with correct kernel

* rerun pr curves notebook

* remove nones

* rerun multiclass model

* rerun notebook

* move file

* create SCM confusion matrix

* rerun notebook

* add changes from last PR

* rerun notebook

* add SCM F1, update SCM confusion matrices

* documentation

* rerun notebook

* Update utils/evaluate_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Update utils/evaluate_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Update 3.evaluate_model/scripts/nbconverted/F1_scores.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* dave suggestions

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Get SCM Predictions and LOIO Probabilities (#29)

* get SCM LOIO probas

* reconvert notebook

* get model predictions

* rerun LOIO

* reconvert notebook

* save and reconvert notebook

* eric suggestions

* Add SCM Interpretations (#30)

* add scm coefficients

* rerun interpret multi-class model

* compare model coefficients

* nbconvert

* readme

* make all correlations negative

* rerun training

* rerun evaluate

* rerun interpret

* docs

* newline

* rerun LOIO

* Remove unused cp features (#31)

* rerun download/split modules

* rerun multicalss models

* rerun single class model

* rerun evaluate module

* get LOIO probas

* rerun interpret module

* rerun download data

* Adding CP features to ggplot visualization (#24)

* set colors for model types

* visualize precision recall with CP and DP+CP

* add F1 score barchart visualization

* minor tweak of f1 score print

* ignore mac files

* merge main and rerun viz

* change color scheme for increased contrast

* add f1 score of the top model, and rerun with updated colors

* nrow = 3 in facet

* change name of weighted f1 score

* update single cell images module (#32)

* Refactor validate module (#33)

* update validate module

* refactor validation

* get correlations

* convert notebook

* update readme

* formatting, documentation

* reset index

* vadd view notebook

* docs, black formatting

* ccc credit

* show all correlations

* add notebook

* remove preview notebook

* convert notebook

* add differences heatmaps

* preview correlation differences

* add docs

* black formatting

---------

Co-authored-by: Erik Serrano <31600622+axiomcura@users.noreply.github.com>
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Co-authored-by: Dave Bunten <ekgto445@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants