Skip to content

Use sample_* variables if present to populate plink family fields when writing #1019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 22, 2023

Conversation

tomwhite
Copy link
Collaborator

@tomwhite tomwhite commented Feb 9, 2023

The idea here is that you can use well-known variables, like sample_phenotype to set fields in the plink fam file. This should address #1017.

This isn't finished yet as it needs some tests. Also, I noticed that currently we only support case/control phenotypes (coded as 2 and 1, respectively) and not more general quantative trait values encoded as a float. This would need fixing too - I'm not sure the best way of doing that yet.

@codecov-commenter
Copy link

codecov-commenter commented Feb 9, 2023

Codecov Report

Merging #1019 (f4f8052) into main (0b04dc8) will not change coverage.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff            @@
##              main     #1019   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           44        44           
  Lines         4521      4534   +13     
=========================================
+ Hits          4521      4534   +13     
Impacted Files Coverage Δ
sgkit/io/plink/plink_reader.py 100.00% <100.00%> (ø)
sgkit/io/plink/plink_writer.py 100.00% <100.00%> (ø)
sgkit/stats/ibs.py 100.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tomwhite tomwhite force-pushed the plink-writer-fam-fields branch from 2bb8314 to d81fb2d Compare February 14, 2023 11:08
@tomwhite tomwhite marked this pull request as ready for review February 14, 2023 11:08
@jeromekelleher
Copy link
Collaborator

@timothymillar would you mind taking a look through this please? I know very little about plink.

@timothymillar
Copy link
Collaborator

I know very little about plink

That makes two of us 😄.

@@ -20,6 +31,20 @@ def write_plink(
) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a missing: Hashable ='.' parameter? To specify the missing parent value and default to the new value from read_plink.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is needed. We don't parameterise the missing value in VCF for example, so I think it's ok to have the sgkit default of "." used here too.

Copy link
Collaborator

@timothymillar timothymillar Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't parameterise the missing value in VCF

Good point, but "." is the missing value in the VCF spec where as "0" is the missing value in a plink fam file. Although, a "0" will stay as a "0" so I guess both cases are covered anyway 🤷

the sgkit default of "."

We should note that in the main documentation (not needed in this PR)

…hen writing

Ensure plink fam missing values in strings ('0') are represented as '.' in sgkit

Test fam data can be roundtripped.
@tomwhite tomwhite force-pushed the plink-writer-fam-fields branch from d81fb2d to f4f8052 Compare February 21, 2023 14:12
@tomwhite
Copy link
Collaborator Author

Thanks for the review @timothymillar. I've updated the PR with changes.

Copy link
Collaborator

@timothymillar timothymillar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jeromekelleher jeromekelleher added the auto-merge Auto merge label for mergify test flight label Feb 22, 2023
@mergify mergify bot merged commit d4780b6 into sgkit-dev:main Feb 22, 2023
@tomwhite tomwhite deleted the plink-writer-fam-fields branch February 22, 2023 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow user to specify coordinate as phenotype in write_plink
4 participants