Skip to content

Conversation

@bpblanken
Copy link
Collaborator

@bpblanken bpblanken commented Dec 17, 2024

This fixes a regression caused by this pr, which modified the relationship checking behavior for parents that are identified in the pedigree but excluded from loading.

Originally we excluded those parental ids from the parsed relationships, which did not work in cases where the parental id helped identify sibling pairs.

This pr preserves the behavior of excluded parents (test case is here), but also allows us to exclude relationships where the sample is not included as loadable in the pedigree (test case is here).

@bpblanken bpblanken requested a review from matren395 December 17, 2024 22:04
# Handle case where relation is identified in the
# pedigree as a "dummy" but is not included in
# the list of samples to load.
if other_id not in family.samples:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code duplication handled with better for-looping!

@bpblanken bpblanken marked this pull request as ready for review December 17, 2024 22:11
@bpblanken bpblanken requested a review from a team as a code owner December 17, 2024 22:11
class Relation(Enum):
PARENT = 'parent'
GRANDPARENT = 'grandparent'
PARENT_CHILD = 'parent_child'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed these for clarity to indicate the bi-directionality.

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really quick - can you explain the Test cases and what the tests are doing before I approve ?

@bpblanken
Copy link
Collaborator Author

@matren395

Yes!

So the root issue at play here is parents that are identified in the pedigree but that are not actually real samples present in the callset.

We have one test here that ensures we do actually parse those samples as parents. A previous iteration of this code did not... which is why that test exists.

The newly added test here checks that the mother ('sample_2') that is parsed from the pedigree does not actually fail the family even if the parent/child relationship is missing from the relatedness check table.

@matren395
Copy link
Contributor

okay! now that the tests make sense, lgtm then

@matren395 matren395 self-requested a review December 18, 2024 17:57
Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^lgtm

@bpblanken bpblanken merged commit 2e8dbcf into dev Dec 18, 2024
3 checks passed
bpblanken added a commit that referenced this pull request Jan 7, 2025
* Add service account credentialing (#997)

* Add service account credentialing

* ruff

* feat: Handle parsing empty predicted sex into Unknown (#1000)

* Add helper functions for querying `Terra Data Repository` (#998)

* Add service account credentialing

* ruff

* First pass

* tests passing

* add coverage of bigquery test

* change function names

* use generators everywhere

* bq requirement

* resolver

* Update sample id name

* Build Sex Check Table from TDR Metrics (#999)

* refactor: Move feature flags to FeatureFlag enum. (#1002)

* refactor: Move feature flags out of environment to their own dataclass

* lint: ruff

* ruff

* bugfix: exclude samples from relationship checking that are not present in the expected loadable samples (#1003)

* bugfix: exclude samples from relationship checking that are not present in the expected loadable samples

* cleanup

* feat: add remap and family loading failures as validation exceptions … (#1005)

* feat: add remap and family loading failures as validation exceptions rather than runtime errors

* move on

* Update write_remapped_and_subsetted_callset_test.py

* ruff

* feat: Add ability to run tasks dataproc. (#948)

* Support gcs dirs in rsync

* ws

* Add create dataproc cluster task

* add dataproc

* ruff

* requirements

* still struggling

* Gencode refactor to remove gcs

* bump reqs

* Run dataproc job

* lib

* running

* merge requirements

* Flip'em

* Better exception handling

* Cleaner approach if less generalizable

* write a test

* Fix tests

* lint

* Add test for success

* refactor to use a base class... better for adding support for multiple jobs

* cleanup

* ruff

* Fix missing mock

* Fix flapping test

* pr comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants