Skip to content

Multiple fixes for data inconsistencies across all datasets. #410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jjacobson95
Copy link
Collaborator

Unfortunately, as the code for this is spread across all datasets, I did not have time to fully test all of these changes because I'd have to run the full build process (and I am leaving for 2 weeks today). The logic updates are relatively simple though and I think they should work, however there are quite a few of them so its possible an error slipped by, as such please review thoroughly.

Resolves #399, #395, model types have been corrected for mpnst, bladderpdo, crcpdo, hcmi, pancpdo, sarcpdo.

  • organoid should now either be patient derived organoid or xenograft derived organoid
  • changed Tumor-Biopsy to tumor model type instead of ex vivo in crcpdo. Did not have time to look into paper, please confirm if this is correct.

Resolves #405, #407, data files where genes that were not in the genes file have now been dropped:

  • beataml_mutations.csv
  • beataml_proteomics.csv
  • beataml_transcriptomics.csv
  • bladderpdo_copy_number.csv
  • bladderpdo_mutations.csv
  • ccle_mutations.csv
  • cptac_copy_number.csv
  • cptac_mutations.csv
  • cptac_proteomics.csv
  • cptac_transcriptomics.csv
  • ctrpv2_mutations.csv
  • fimm_mutations.csv
  • gcsi_mutations.csv
  • hcmi_mutations.csv
  • nci60_mutations.csv
  • prism_mutations.csv

Resolves #408, data files where float has been converted to int for entrez_id or improve_sample_id:

  • ccle_experiments.tsv
  • ctrpv2_experiments.tsv
  • fimm_experiments.tsv
  • gcsi_experiments.tsv
  • gdscv1_experiments.tsv
  • gdscv2_experiments.tsv
  • nci60_experiments.tsv
  • prism_experiments.tsv
  • sarcpdo_mutations.csv
  • sarcpdo_transcriptomics.csv

Resolves #393, #392 and fills in blanks in cancer_type with Normal Tissue for PancPDO and HCMI.

Resolves #396, All depmap datasets will now be in the correct TPM scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant