Skip to content

Remove legacy code#46

Merged
FrancescoCasalegno merged 17 commits intomainfrom
remove-legacy-code
May 12, 2022
Merged

Remove legacy code#46
FrancescoCasalegno merged 17 commits intomainfrom
remove-legacy-code

Conversation

@Stannislav
Copy link
Contributor

@Stannislav Stannislav commented Apr 29, 2022

Fixes #21 as part of the clean up.

Description

1. morphoclass.serialization

  • Remove the serialization module - unused

2. morphoclass.feature_extractors >> morphoclass.features

  • Rename morphoclass.feature_extractors as morphoclass.features. Also, most of its submodules are removed.
    • feature_extractors.feature_extraction
      Removed. It is superseded by the code in the new morphoclass extract-features command.

    • feature_extractors.{interneurons,pyramidal_cells}
      Removed. They were used to extract features in the old train CLI command. Since then we changed the way we extract features - now they're pre-extracted using the extract-features CLI command and the corresponding functionality has been ported from the deleted modules to console.cmd_extract_features. This rendered two modules in question obsolete and it was safe to delete them. In the future one should probably move the feature extraction code from console.cmd_extract_features to a more appropriate module.

    • feature_extractors.non_graph
      Renamed as features.non_graph.

4. data_preparation / data-preparation

  • data-preparation used to be a DVC stage that got replaced by preprocess-dataset
  • This stage does not exist any more (it was already removed from the dvc.yaml in the past), so:
    • remove references to that stage (and to its outputs written in data_preparation) from params.yaml and dvc.lock
    • remove .gitignore for data_preparation/ folder
  • The whole item datasets is removed from params.yaml. It was used by the stage data-preparation and it was already not used anywhere in the dvc.yaml before this PR.

5. dvc/models/

  • dvc/models is where checkpoints used to be in the past, now they're under training/checkpoints, so:
    • remove references to this directory from params.yaml and dvc.lock
    • remove .gitignore for folders inside dvc/models/

6. explain_models / explain-models

  • explain-models used to be a DVC stage that got replaced by xai (i.e. even before this PR it does not exist anymore in the src/) so:
    • remove references to this directory from params.yaml, dvc.yaml, and dvc.lock.

7. morphoclass.augmentation

  • The module morphoclass.augmentation was already deprecated in favor of the new morphoclass.transforms.
  • Remove morphoclass.augmentation.
  • The function morphoclass.augmentation.finding_point_on_branch() is renamed and moved to morphoclass.utils.find_point_on_branch().

8. dvc/README.md

  • Removed dvc/README.md: it contained information about how to run DVC stages, but most of those stages are gone and the information is not useful any more.

9. Generation of params.yaml and .yaml training config files

  • Remove dvc/generate_params.py - was used in the past to generate training configurations in params.yaml; after refactoring we now use separate config files for training configuration
  • Remove dvc/training/make_configs.py - was used to auto-generate the training config files under training/configs; it was intended to be temporary; the configs have evolved and this script is outdated; given there aren't that many configs it's best to edit them by hand

10. morphoclass performance-report CLI command

  • Remove the performance-report CLI command - it got split and replaced by the following commands
    • performance-table
    • evaluate performance
    • evaluate latent-features
    • evaluate outliers
  • Remove corresponding DVC stages from dvc.yaml, dvc.lock, params.yaml + remove the output directory reports/performance-report from the .gitignore.

11. MorphologyEmbeddingDataset and MorphologyEmbeddingDataLoader

  • Remove MorphologyEmbedding{Dataset,DataLoader} classes - they got superseded by the Morphology{Dataset,DataLoader} classes which now can handle all kind of data: morphologies, images, diagrams.
  • Previosly MorphologyDataset was only for morphologies and MorphologyEmbeddingDataset for diagrams and images. Since these two classes were for the most part duplicates of each other it made sense to merge them into one.
  • The two following submodules (where the aforementioned classes were implemented) are now removed:
    • morphoclass.data.morphology_embedding_data_loader
    • morphoclass.data.morphology_embedding_dataset
  • This removal is safe, because anyway morphology_embedding_* submodules were used nowhere (see git grep -rn "morphology_embedding_"), except for:
    • morphology_dataset.py — in method MorphologyDataset.to_lowerdim_dataset(), which was only used in the now removed feature_extractors.{interneurons,pyramidal_cells}.
    • feature_extractors.{interneurons,pyramidal_cells}.py — which have been removed

Checklist

  • This PR refers to an issue from the issue tracker.
    (if it is not the case, please create an issue first).
  • Unit tests added.
    (if needed)
  • setup.cfg, requirements.txt, and constraints.txt updated with new dependencies.
    (if needed)
  • Type annotations added.
    (if a function is added or modified)
  • All CI tests pass.

@Stannislav Stannislav self-assigned this Apr 29, 2022
Stanislav Schmidt and others added 4 commits May 5, 2022 11:52
It got split and replaced by the following commands
* performance-table
* evaluate performance
* evaluate latent-features
* evaluate outliers
This used to be a DVC stage, now replaced by preprocess-dataset
@Stannislav Stannislav force-pushed the remove-legacy-code branch from ddeda1a to a1b1dc2 Compare May 5, 2022 10:36
@Stannislav Stannislav marked this pull request as ready for review May 5, 2022 16:26
@Stannislav
Copy link
Contributor Author

@FrancescoCasalegno I'm happy to discuss should there be anything obscure in the changes I made.

model: xgb
features-rd: image-tmd-rd
features-proj: image-tmd-proj
feature_extractors:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to remove. Also, notice that not even before this PR (see on branch main) these items were ever used in the dvc.yaml.

@@ -1,163 +0,0 @@
# Copyright © 2022-2022 Blue Brain Project/EPFL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are removing all unit tests for the submodules of morphoclass.feature_extractors that were removed in this PR, i.e. interneurons, pyramidal_cells, feature_extraction.

However, we don't have unit tests for the only submodule left, i.e. morphoclass.features.non_graph.
It was already a problem before this PR, but just leaving this comment here so we can address it in the future.

@@ -1,44 +1,3 @@
datasets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to remove. Also, notice that not even before this PR (see on branch main) these items were ever used in the dvc.yaml.

intermediary_directory: reports/data_preparation/pyramidal_cells
output_dataset_directory: data/prepared/pyramidal_cells
output_wildcard: '''data/prepared/pyramidal_cells/*/*/*.h5'''
explain_models:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to remove, explain-models DVC stage was now removed by xai DVC stage.

outs:
- ${item.results_file}
# - ${item.output_wildcard}
explain-models:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because xai replaced explain-models.

- ${item.checkpoint_path}
outs:
- ${item.checkpoint_path_outs}
performance-report:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because the functionalities are now implemented in the following CLI sub-commands of morphoclass:

  • performance-table
  • evaluate performance — this is also available in the DVC stages evaluate-***
  • evaluate latent-features
  • evaluate outliers

id: pc_tmd
input_csv: data/prepared/pyramidal_cells/dataset_features_labels.csv
output_wildcard: '''data/prepared/pyramidal_cells/*/*/*_tmd.pickle'''
performance-report:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used any more, because the DVC stage that used this (performance-report) item was removed.

else:
return None

def to_lowerdim_dataset(self, embedding_type, **feat_extr_kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place where this function was used was in the submodules

  • morphoclass.feature_extractors.pyramidal_cells
  • morphoclass.feature_extractors.interneurons

Since these two submodules are removed in this PR, we can safely remove this function.

Copy link
Contributor

@FrancescoCasalegno FrancescoCasalegno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @Stannislav!

This PR is quite large, and maybe in the future we could try to create several smaller PRs to address individual issues, otherwise the reviewing process is a bit complex (11 refactoring changes + 63 files changed = 😱).

I tried to document everything in the PR description, and I went through each file to make sure not to miss any issue.

Hopefully we didn't break anything!

@FrancescoCasalegno FrancescoCasalegno merged commit e579410 into main May 12, 2022
@FrancescoCasalegno FrancescoCasalegno deleted the remove-legacy-code branch May 12, 2022 12:09
@Stannislav
Copy link
Contributor Author

LGTM @Stannislav!

This PR is quite large, and maybe in the future we could try to create several smaller PRs to address individual issues, otherwise the reviewing process is a bit complex (11 refactoring changes + 63 files changed = 😱).

I tried to document everything in the PR description, and I went through each file to make sure not to miss any issue.

Hopefully we didn't break anything!

Yes, I agree, this was a big one. Partially the reason was that things depended on each other, so they had to be done together as to not leave main in a broken state. Good thing is most of the changes are just deletions of modules :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove obsolete entrypoint commands

2 participants