Conversation
It got split and replaced by the following commands * performance-table * evaluate performance * evaluate latent-features * evaluate outliers
This used to be a DVC stage, now replaced by preprocess-dataset
ddeda1a to
a1b1dc2
Compare
They got superseded by the Morphology{Dataset,DataLoader}
classes.
|
@FrancescoCasalegno I'm happy to discuss should there be anything obscure in the changes I made. |
| model: xgb | ||
| features-rd: image-tmd-rd | ||
| features-proj: image-tmd-proj | ||
| feature_extractors: |
There was a problem hiding this comment.
Seems fine to remove. Also, notice that not even before this PR (see on branch main) these items were ever used in the dvc.yaml.
| @@ -1,163 +0,0 @@ | |||
| # Copyright © 2022-2022 Blue Brain Project/EPFL | |||
There was a problem hiding this comment.
We are removing all unit tests for the submodules of morphoclass.feature_extractors that were removed in this PR, i.e. interneurons, pyramidal_cells, feature_extraction.
However, we don't have unit tests for the only submodule left, i.e. morphoclass.features.non_graph.
It was already a problem before this PR, but just leaving this comment here so we can address it in the future.
| @@ -1,44 +1,3 @@ | |||
| datasets: | |||
There was a problem hiding this comment.
Seems fine to remove. Also, notice that not even before this PR (see on branch main) these items were ever used in the dvc.yaml.
| intermediary_directory: reports/data_preparation/pyramidal_cells | ||
| output_dataset_directory: data/prepared/pyramidal_cells | ||
| output_wildcard: '''data/prepared/pyramidal_cells/*/*/*.h5''' | ||
| explain_models: |
There was a problem hiding this comment.
Seems fine to remove, explain-models DVC stage was now removed by xai DVC stage.
| outs: | ||
| - ${item.results_file} | ||
| # - ${item.output_wildcard} | ||
| explain-models: |
There was a problem hiding this comment.
Removed because xai replaced explain-models.
| - ${item.checkpoint_path} | ||
| outs: | ||
| - ${item.checkpoint_path_outs} | ||
| performance-report: |
There was a problem hiding this comment.
Removed because the functionalities are now implemented in the following CLI sub-commands of morphoclass:
performance-tableevaluate performance— this is also available in the DVC stagesevaluate-***evaluate latent-featuresevaluate outliers
| id: pc_tmd | ||
| input_csv: data/prepared/pyramidal_cells/dataset_features_labels.csv | ||
| output_wildcard: '''data/prepared/pyramidal_cells/*/*/*_tmd.pickle''' | ||
| performance-report: |
There was a problem hiding this comment.
Not used any more, because the DVC stage that used this (performance-report) item was removed.
| else: | ||
| return None | ||
|
|
||
| def to_lowerdim_dataset(self, embedding_type, **feat_extr_kwargs): |
There was a problem hiding this comment.
The only place where this function was used was in the submodules
morphoclass.feature_extractors.pyramidal_cellsmorphoclass.feature_extractors.interneurons
Since these two submodules are removed in this PR, we can safely remove this function.
FrancescoCasalegno
left a comment
There was a problem hiding this comment.
LGTM @Stannislav!
This PR is quite large, and maybe in the future we could try to create several smaller PRs to address individual issues, otherwise the reviewing process is a bit complex (11 refactoring changes + 63 files changed = 😱).
I tried to document everything in the PR description, and I went through each file to make sure not to miss any issue.
Hopefully we didn't break anything!
Yes, I agree, this was a big one. Partially the reason was that things depended on each other, so they had to be done together as to not leave |
Fixes #21 as part of the clean up.
Description
1.
morphoclass.serializationserializationmodule - unused2.
morphoclass.feature_extractors>>morphoclass.featuresmorphoclass.feature_extractorsasmorphoclass.features. Also, most of its submodules are removed.feature_extractors.feature_extractionRemoved. It is superseded by the code in the new
morphoclass extract-featurescommand.feature_extractors.{interneurons,pyramidal_cells}Removed. They were used to extract features in the old
trainCLI command. Since then we changed the way we extract features - now they're pre-extracted using theextract-featuresCLI command and the corresponding functionality has been ported from the deleted modules toconsole.cmd_extract_features. This rendered two modules in question obsolete and it was safe to delete them. In the future one should probably move the feature extraction code fromconsole.cmd_extract_featuresto a more appropriate module.feature_extractors.non_graphRenamed as
features.non_graph.4.
data_preparation/data-preparationdata-preparationused to be a DVC stage that got replaced bypreprocess-datasetdvc.yamlin the past), so:data_preparation) fromparams.yamlanddvc.lock.gitignorefordata_preparation/folderdatasetsis removed fromparams.yaml. It was used by the stagedata-preparationand it was already not used anywhere in thedvc.yamlbefore this PR.5.
dvc/models/dvc/modelsis where checkpoints used to be in the past, now they're undertraining/checkpoints, so:params.yamlanddvc.lock.gitignorefor folders insidedvc/models/6.
explain_models/explain-modelsexplain-modelsused to be a DVC stage that got replaced byxai(i.e. even before this PR it does not exist anymore in thesrc/) so:params.yaml,dvc.yaml, anddvc.lock.7.
morphoclass.augmentationmorphoclass.augmentationwas already deprecated in favor of the newmorphoclass.transforms.morphoclass.augmentation.morphoclass.augmentation.finding_point_on_branch()is renamed and moved tomorphoclass.utils.find_point_on_branch().8.
dvc/README.mddvc/README.md: it contained information about how to run DVC stages, but most of those stages are gone and the information is not useful any more.9. Generation of
params.yamland.yamltraining config filesdvc/generate_params.py- was used in the past to generate training configurations inparams.yaml; after refactoring we now use separate config files for training configurationdvc/training/make_configs.py- was used to auto-generate the training config files undertraining/configs; it was intended to be temporary; the configs have evolved and this script is outdated; given there aren't that many configs it's best to edit them by hand10.
morphoclass performance-reportCLI commandperformance-reportCLI command - it got split and replaced by the following commandsperformance-tableevaluate performanceevaluate latent-featuresevaluate outliersdvc.yaml,dvc.lock,params.yaml+ remove the output directoryreports/performance-reportfrom the.gitignore.11.
MorphologyEmbeddingDatasetandMorphologyEmbeddingDataLoaderMorphologyEmbedding{Dataset,DataLoader}classes - they got superseded by theMorphology{Dataset,DataLoader}classes which now can handle all kind of data: morphologies, images, diagrams.MorphologyDatasetwas only for morphologies andMorphologyEmbeddingDatasetfor diagrams and images. Since these two classes were for the most part duplicates of each other it made sense to merge them into one.morphoclass.data.morphology_embedding_data_loadermorphoclass.data.morphology_embedding_datasetmorphology_embedding_*submodules were used nowhere (seegit grep -rn "morphology_embedding_"), except for:morphology_dataset.py— in methodMorphologyDataset.to_lowerdim_dataset(), which was only used in the now removedfeature_extractors.{interneurons,pyramidal_cells}.feature_extractors.{interneurons,pyramidal_cells}.py— which have been removedChecklist
(if it is not the case, please create an issue first).
(if needed)
setup.cfg,requirements.txt, andconstraints.txtupdated with new dependencies.(if needed)
(if a function is added or modified)