-
Notifications
You must be signed in to change notification settings - Fork 35
Use sample_*
variables if present to populate plink family fields when writing
#1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,19 @@ | |
from sgkit.typing import ArrayLike, PathType | ||
|
||
BED_READER_MISSING_INT_VALUE = -127 | ||
INT_MISSING = -1 | ||
STR_MISSING = "." | ||
|
||
FAM_VARIABLE_TO_BED_READER = { | ||
# sgkit variable name : bed reader properties name | ||
"sample_family_id": "fid", | ||
"sample_member_id": "iid", | ||
"sample_paternal_id": "father", | ||
"sample_maternal_id": "mother", | ||
"sample_sex": "sex", | ||
"sample_phenotype": "pheno", | ||
"variant_id": "sid", | ||
} | ||
|
||
|
||
def write_plink( | ||
|
@@ -20,6 +33,20 @@ def write_plink( | |
) -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this is needed. We don't parameterise the missing value in VCF for example, so I think it's ok to have the sgkit default of "." used here too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good point, but "." is the missing value in the VCF spec where as "0" is the missing value in a plink fam file. Although, a "0" will stay as a "0" so I guess both cases are covered anyway 🤷
We should note that in the main documentation (not needed in this PR) |
||
"""Convert a dataset to a PLINK file. | ||
|
||
If any of the following pedigree-specific variables are defined in the dataset | ||
they will be included in the PLINK fam file. Otherwise, the PLINK fam file will | ||
contain missing values for these fields, except for the within-family identifier | ||
for samples, which will be taken from the dataset ``sample_id``. | ||
|
||
- ``sample_family_id``: Family identifier commonly referred to as FID | ||
- ``sample_member_id``: Within-family identifier for sample | ||
- ``sample_paternal_id``: Within-family identifier for father of sample | ||
- ``sample_maternal_id``: Within-family identifier for mother of sample | ||
- ``sample_sex``: Sex code equal to 1 for male, 2 for female, and -1 | ||
for missing | ||
- ``sample_phenotype``: Phenotype code equal to 1 for control, 2 for case, | ||
and -1 for missing | ||
|
||
Parameters | ||
---------- | ||
ds | ||
|
@@ -78,11 +105,17 @@ def write_plink( | |
"bp_position": ds.variant_position.values, | ||
"allele_1": ds.variant_allele.values[:, 0], | ||
"allele_2": ds.variant_allele.values[:, 1], | ||
"iid": ds.sample_id, | ||
"iid": ds.sample_id.values, # may be overridden by sample_member_id below (if present) | ||
} | ||
|
||
if "variant_id" in ds: | ||
properties["sid"] = ds.variant_id | ||
for var, prop in FAM_VARIABLE_TO_BED_READER.items(): | ||
if var in ds: | ||
values = ds[var].values | ||
if values.dtype.kind in ("O", "S", "U"): | ||
values = np.where(values == STR_MISSING, "0", values) | ||
elif values.dtype.kind in ("i", "u"): | ||
values = np.where(values == INT_MISSING, 0, values) | ||
properties[prop] = values | ||
|
||
output_file = bed_path | ||
val = call_g.T | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
1 1_10 0 10 A G | ||
1 1_20 0 20 T C |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
ind0 ind0 0 0 0 -9 | ||
ind1 ind1 0 0 0 -9 | ||
ind2 ind2 ind1 ind0 2 1 | ||
ind3 ind3 ind1 ind0 1 2 | ||
ind4 ind4 0 0 0 -9 | ||
ind5 ind5 0 0 0 -9 | ||
ind6 ind6 0 0 0 -9 | ||
ind7 ind7 0 0 0 -9 | ||
ind8 ind8 0 0 0 -9 | ||
ind9 ind9 0 0 0 -9 |
Uh oh!
There was an error while loading. Please reload this page.