-
Notifications
You must be signed in to change notification settings - Fork 22
bugfix: exclude samples from relationship checking that are not present in the expected loadable samples #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nt in the expected loadable samples
| # Handle case where relation is identified in the | ||
| # pedigree as a "dummy" but is not included in | ||
| # the list of samples to load. | ||
| if other_id not in family.samples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code duplication handled with better for-looping!
| class Relation(Enum): | ||
| PARENT = 'parent' | ||
| GRANDPARENT = 'grandparent' | ||
| PARENT_CHILD = 'parent_child' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed these for clarity to indicate the bi-directionality.
matren395
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really quick - can you explain the Test cases and what the tests are doing before I approve ?
|
Yes! So the root issue at play here is parents that are identified in the pedigree but that are not actually real samples present in the callset. We have one test here that ensures we do actually parse those samples as parents. A previous iteration of this code did not... which is why that test exists. The newly added test here checks that the mother ('sample_2') that is parsed from the pedigree does not actually fail the family even if the parent/child relationship is missing from the relatedness check table. |
|
okay! now that the tests make sense, lgtm then |
matren395
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^lgtm
* Add service account credentialing (#997) * Add service account credentialing * ruff * feat: Handle parsing empty predicted sex into Unknown (#1000) * Add helper functions for querying `Terra Data Repository` (#998) * Add service account credentialing * ruff * First pass * tests passing * add coverage of bigquery test * change function names * use generators everywhere * bq requirement * resolver * Update sample id name * Build Sex Check Table from TDR Metrics (#999) * refactor: Move feature flags to FeatureFlag enum. (#1002) * refactor: Move feature flags out of environment to their own dataclass * lint: ruff * ruff * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples (#1003) * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples * cleanup * feat: add remap and family loading failures as validation exceptions … (#1005) * feat: add remap and family loading failures as validation exceptions rather than runtime errors * move on * Update write_remapped_and_subsetted_callset_test.py * ruff * feat: Add ability to run tasks dataproc. (#948) * Support gcs dirs in rsync * ws * Add create dataproc cluster task * add dataproc * ruff * requirements * still struggling * Gencode refactor to remove gcs * bump reqs * Run dataproc job * lib * running * merge requirements * Flip'em * Better exception handling * Cleaner approach if less generalizable * write a test * Fix tests * lint * Add test for success * refactor to use a base class... better for adding support for multiple jobs * cleanup * ruff * Fix missing mock * Fix flapping test * pr comments
This fixes a regression caused by this pr, which modified the relationship checking behavior for parents that are identified in the pedigree but excluded from loading.
Originally we excluded those parental ids from the parsed relationships, which did not work in cases where the parental id helped identify sibling pairs.
This pr preserves the behavior of excluded parents (test case is here), but also allows us to exclude relationships where the sample is not included as loadable in the pedigree (test case is here).