Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint: detect duplicated observable IDs #446

Merged
merged 2 commits into from
Jul 1, 2020
Merged

Conversation

lcontento
Copy link
Contributor

The linter function check_observable_df does not detect duplicated observable IDs. MWE is

import petab
import pandas as pd

observable_df = pd.DataFrame(data={
    petab.OBSERVABLE_ID: ['obs1', 'obs1'],
    petab.OBSERVABLE_FORMULA: ['x1', 'x2'],
    petab.NOISE_FORMULA: ['sigma1', 'sigma2']
}).set_index(petab.OBSERVABLE_ID)

petab.lint.check_observable_df(observable_df)

Incidentally, this causes a problem in check_measurement_df too.

@codecov
Copy link

codecov bot commented Jun 18, 2020

Codecov Report

Merging #446 into develop will increase coverage by 0.10%.
The diff coverage is 92.85%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #446      +/-   ##
===========================================
+ Coverage    77.88%   77.99%   +0.10%     
===========================================
  Files           22       22              
  Lines         2211     2222      +11     
  Branches       529      531       +2     
===========================================
+ Hits          1722     1733      +11     
  Misses         363      363              
  Partials       126      126              
Impacted Files Coverage Δ
petab/lint.py 77.77% <92.85%> (+0.82%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 843edd3...05dd5bf. Read the comment docs.

petab/lint.py Outdated
@@ -621,6 +625,21 @@ def assert_noise_distributions_valid(observable_df: pd.DataFrame) -> None:
f"table: {distr}.")


def assert_observable_id_is_unique(observable_df: pd.DataFrame) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def assert_observable_id_is_unique(observable_df: pd.DataFrame) -> None:
def assert_unique_observable_ids(observable_df: pd.DataFrame) -> None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is indeed better. But since I essentially copied assert_parameter_id_is_unique, maybe if I change the name for the new function I should change it for the old too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. yes, makes sense to change the old one too then.

if len(observable_df.index) != len(set(observable_df.index)):
raise AssertionError(
f"{OBSERVABLE_ID} column in observable table is not unique.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be good to also return the duplicate ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to put the duplicated list inside the AssertionError message? It is certainly more informative. Again, if I do it here I should do it for assert_parameter_id_is_unique too I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not crucial, but I guess that would be helpful.

@lcontento lcontento merged commit 9cbf54e into develop Jul 1, 2020
@lcontento lcontento deleted the fix_lint_duplicated_obs branch July 1, 2020 08:37
@yannikschaelte yannikschaelte mentioned this pull request Jul 23, 2020
dweindl added a commit that referenced this pull request Sep 21, 2020
Release 0.1.9

Library:

* Allow URL as filenames for YAML files and SBML models (Closes #187) (#459)
* Allow model time in observable formulas (#445)
* Make float parsing from CSV round-trip (#444)
* Validator: Error message for missing IDs, with line numbers. (#467)
* Validator: Detect duplicated observable IDs (#446)
* Some documentation and CI fixes / updates
* Visualization: Add option to save visualization specification (#457)
* Visualization: Column XValue not mandatory anymore (#429)
* Visualization: Add sorting of indices of dataframes for the correct sorting
  of x-values (#430)
* Visualization: Default value for the column x_label in vis_spec (#431)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants