-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Examples dataset for Phenotype BEP #393
base: master
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,73 @@ | |||
"""Utility script to remove subjects from the phenotype dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Utility script to remove subjects from the phenotype dataset. | |
""" | |
Utility script to remove subjects from the phenotype dataset. |
if "sub-ON01016" not in subjects_to_keep: | ||
subjects_to_keep.append('sub-ON01016') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment for reasoning?
print(f'Removing {subject_dir}') | ||
shutil.rmtree(subject_dir) | ||
|
||
# remove subject from participants.tsv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# remove subject from participants.tsv | |
# remove subjects from participants.tsv |
by the way I am not sure we will want to keep the script in the end, but I put it in the PR in case we want to regenerate those datasets slightly differently |
print(participants_df) | ||
participants_df.to_csv(participants_tsv, sep='\t', index=False) | ||
|
||
# remove subject from all tsv in phenotype folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# remove subject from all tsv in phenotype folder | |
# remove subjects from all tsv's in phenotype folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a great utility! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe delete the unpaired JSONs like this one (meaning there's no TSV to go with the JSON)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think I did that manually for a couple of things and that would have to be scripted.
another todo in the scrip:
|
@@ -0,0 +1,2 @@ | |||
participant_id bai01_r03, bai1_1 bai02_r03, bai1_2 bai03_r03, bai1_3 bai04_r03, bai1_4 bai05_r03, bai1_5 bai06_r03, bai1_6 bai07_r03, bai2_7 bai08_r03, bai2_8 bai09_r03, bai2_9 bai10_r03, bai2_10 bai11_r03, bai2_11 bai12_r03, bai3_12 bai13_r03, bai3_13 bai14_r03, bai3_14 bai15_r03, bai3_15 bai16_r03, bai3_16 bai17_r03, bai4_1 bai18_r03, bai4_2 bai19_r03, bai4_3 bai20_r03, bai4_4 bai21_r03, bai4_5 | |||
sub-ON66199 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember if we decided since this file is nested under a participant_id folder that we don't need the participant_id column, or if maybe it just always needs to be there. Open to commentary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a wonderful bunch of examples and utilities! Address those few mini-suggestions and the couple slightly larger comments and I approve!
No description provided.