-
Notifications
You must be signed in to change notification settings - Fork 35
Use sample_*
variables if present to populate plink family fields when writing
#1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1019 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 44 44
Lines 4521 4534 +13
=========================================
+ Hits 4521 4534 +13
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
2bb8314
to
d81fb2d
Compare
@timothymillar would you mind taking a look through this please? I know very little about plink. |
That makes two of us 😄. |
@@ -20,6 +31,20 @@ def write_plink( | |||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a missing: Hashable ='.'
parameter? To specify the missing parent value and default to the new value from read_plink
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is needed. We don't parameterise the missing value in VCF for example, so I think it's ok to have the sgkit default of "." used here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't parameterise the missing value in VCF
Good point, but "." is the missing value in the VCF spec where as "0" is the missing value in a plink fam file. Although, a "0" will stay as a "0" so I guess both cases are covered anyway 🤷
the sgkit default of "."
We should note that in the main documentation (not needed in this PR)
…hen writing Ensure plink fam missing values in strings ('0') are represented as '.' in sgkit Test fam data can be roundtripped.
d81fb2d
to
f4f8052
Compare
Thanks for the review @timothymillar. I've updated the PR with changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
write_plink
#1017 and a part of Write family information inwrite_plink
#1010changelog.rst
The idea here is that you can use well-known variables, like
sample_phenotype
to set fields in the plink fam file. This should address #1017.This isn't finished yet as it needs some tests. Also, I noticed that currently we only support case/control phenotypes (coded as 2 and 1, respectively) and not more general quantative trait values encoded as a float. This would need fixing too - I'm not sure the best way of doing that yet.