Skip to content

Commit

Permalink
AlexsLemonade#959 HGG update hotspots (AlexsLemonade#1077)
Browse files Browse the repository at this point in the history
* add hotspots maf

* add hotspots and run 02 from focal-cn-file-prep

* add hotspots + consensus maf

* Update analyses/tp53_nf1_score/05-tp53-altered-annotation.Rmd

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

* Update analyses/tp53_nf1_score/05-tp53-altered-annotation.Rmd

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

* Update analyses/tp53_nf1_score/05-tp53-altered-annotation.Rmd

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

* updating cnv loss filter

* updating doc in cnv loss nb

* adding 02 html from focal-cn

* add hotspots+consensus maf

* rerun

* rerun add back lgat filter

* update README

* Update analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* updat fread to select from vector

* only keep TERT promoter muts

* Update README.md

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
  • Loading branch information
3 people authored May 28, 2021
1 parent 3dc359e commit c074457
Show file tree
Hide file tree
Showing 14 changed files with 301 additions and 229 deletions.
2 changes: 1 addition & 1 deletion analyses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
| [`molecular-subtyping-CRANIO`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-CRANIO) | `pbta-histologies-base.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` <br> `pbta-snv-scavenged-hotspots.maf.tsv.gz`| Molecular subtyping of craniopharyngiomas samples [#810](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/810) | `results/CRANIO_molecular_subtype.tsv`
| [`molecular-subtyping-EPN`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EPN) | `pbta-histologies-base.tsv` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `pbta-cnv-consensus-gistic.zip` <br> `analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv` <br> `analyses/fusion-summary/results/fusion_summary_ependymoma_foi.tsv` <br> `analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv` | *In progress*; molecular subtyping of ependymoma tumors | `results/EPN_all_data_withsubgroup.tsv`
| [`molecular-subtyping-EWS`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EWS) | `pbta-histologies-base.tsv` <br> `analyses/fusion-summary/results/fusion_summary_ewings_foi.tsv`| Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per [#623](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/623) | `results/EWS_samples.tsv`
| [`molecular-subtyping-HGG`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-HGG) | `pbta-histologies-base.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` <br> `analyses/focal-cn-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz` <br> `analyses/fusion_filyering/results/pbta-fusion-putative-oncogenic.tsv` <br> `pbta-cnv-consensus-gistic.zip` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` | Molecular subtyping of high-grade glioma samples [#249](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/249) | `results/HGG_molecular_subtype.tsv`
| [`molecular-subtyping-HGG`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-HGG) | `pbta-histologies-base.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` <br> `pbta-snv-scavenged-hotspots.maf.tsv.gz` <br> `analyses/focal-cn-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz` <br> `analyses/fusion_filyering/results/pbta-fusion-putative-oncogenic.tsv` <br> `pbta-cnv-consensus-gistic.zip` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` | Molecular subtyping of high-grade glioma samples [#249](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/249) | `results/HGG_molecular_subtype.tsv`
| [`molecular-subtyping-LGAT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-LGAT)| `pbta-histologies-base.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` <br> `pbta-snv-scavenged-hotspots.maf.tsv.gz` <br> `analyses/fusion_filtering/results/pbta-fusion-putative-oncogenic.tsv` <br> `pbta-fusion-recurrently-fused-genes-bysample.tsv`| Molecular subtyping of Low-grade astrocytic tumor samples [#631](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/631) | `results/lgat_subtyping.tsv`
| [`molecular-subtyping-MB`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-MB) | `pbta-histologies-base.tsv` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` | Molecular classification of Medulloblastoma subtypes (part of [#731](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/731)) | `results/MB_molecular_subtype.tsv` <br> `results/MB_batchcorrected_molecular_subtype.tsv` <br> for uncorrected and batch-corrected input matrix
| [`molecular-subtyping-SHH-tp53`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-SHH-tp53) | `pbta-histologies` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` | *Deprecated*; Identify the SHH-classified medulloblastoma samples that have TP53 mutations [#247](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/247) | N/A
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,21 +81,48 @@ lgat_specimens <- metadata %>%
pull(Kids_First_Biospecimen_ID)
```

# Read in snv consensus mutation data, filtering out LGAT
# Read in snv consensus and hotspots mutation data, filtering out LGAT
- We will use the `pbta-snv-consensus-mutation.maf.tsv.gz` from `snv-callers` module which gathers calls that are present in all 3 callers (strelka2, mutect2, and lancet)
- In addition, we will also use `pbta-snv-scavenged-hotspots.maf.tsv.gz` from `hotspot-detection` module to gather calls that overlap MSKCC hotspots found in any caller (except if only vardict calls the site as variant, we remove these calls since we have a lot of calls unique to vardict which we consider as false positive as discussed [here](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/snv-callers#snv-caller-comparison-analysis))

```{r}
snv_df <-
data.table::fread(file.path(root_dir,
"data",
"pbta-snv-consensus-mutation.maf.tsv.gz")) %>%
# select tumor sample barcode, gene, short protein annotation and variant classification
keep_cols <- c("Chromosome",
"Start_Position",
"End_Position",
"Strand",
"Variant_Classification",
"IMPACT",
"Tumor_Sample_Barcode",
"Hugo_Symbol",
"HGVSp_Short",
"Exon_Number")
snv_consensus_maf <- data.table::fread(
file.path(root_dir, "data" , "pbta-snv-consensus-mutation.maf.tsv.gz"),
select = keep_cols,
data.table = FALSE)
## Read in snv hotspot mutation data
snv_hotspot_maf <- data.table::fread(
file.path(root_dir, "analyses" , "hotspots-detection" , "results" , "pbta-snv-scavenged-hotspots.maf.tsv.gz"),
select = keep_cols,
data.table = FALSE) %>%
select(colnames(snv_consensus_maf))
snv_consensus_hotspot_maf <- snv_consensus_maf %>%
bind_rows(snv_hotspot_maf) %>%
unique() %>%
filter(!Tumor_Sample_Barcode %in% lgat_specimens)
```


## SNV consensus mutation data - defining lesions

```{r}
# Filter the snv consensus mutation data for the target lesions
snv_lesions_df <- snv_df %>%
snv_lesions_df <- snv_consensus_hotspot_maf %>%
dplyr::filter(Hugo_Symbol %in% c("H3F3A", "HIST1H3B",
"HIST1H3C", "HIST2H3C") &
HGVSp_Short %in% c("p.K28M", "p.G35R",
Expand Down Expand Up @@ -130,7 +157,7 @@ snv_lesions_df <- snv_df %>%
snv_lesions_df <- snv_lesions_df %>%
dplyr::bind_rows(
data.frame(
Tumor_Sample_Barcode = setdiff(unique(snv_df$Tumor_Sample_Barcode),
Tumor_Sample_Barcode = setdiff(unique(snv_consensus_hotspot_maf $Tumor_Sample_Barcode),
snv_lesions_df$Tumor_Sample_Barcode)
)
) %>%
Expand Down Expand Up @@ -181,4 +208,4 @@ readr::write_tsv(snv_lesions_df,
```{r}
# Print the session information
sessionInfo()
```
```

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -92,19 +92,32 @@ gistic_df <- data.table::fread(file.path(root_dir,


# Read in snv consensus mutation data
snv_maf_df <-
data.table::fread(file.path(root_dir,
"data",
"pbta-snv-consensus-mutation.maf.tsv.gz"),
select = c("Chromosome",
"Start_Position",
"End_Position",
"Strand",
"Variant_Classification",
"Tumor_Sample_Barcode",
"Hugo_Symbol",
"HGVSp_Short"),
data.table = FALSE)
# select tumor sample barcode, gene, short protein annotation and variant classification
keep_cols <- c("Chromosome",
"Start_Position",
"End_Position",
"Strand",
"Variant_Classification",
"IMPACT",
"Tumor_Sample_Barcode",
"Hugo_Symbol",
"HGVSp_Short",
"Exon_Number")

snv_consensus_maf <- data.table::fread(
file.path(root_dir, "data" , "pbta-snv-consensus-mutation.maf.tsv.gz"),
select = keep_cols,
data.table = FALSE)
## Read in snv hotspot mutation data
snv_hotspot_maf <- data.table::fread(
file.path(root_dir, "analyses" , "hotspots-detection" , "results" , "pbta-snv-scavenged-hotspots.maf.tsv.gz"),
select = keep_cols,
data.table = FALSE) %>%
select(colnames(snv_consensus_maf))

snv_consensus_hotspot_maf <- snv_consensus_maf %>%
bind_rows(snv_hotspot_maf) %>%
unique()

# Read in output file from `01-HGG-molecular-subtyping-defining-lesions.Rmd`
hgg_lesions_df <- read_tsv(
Expand Down Expand Up @@ -257,12 +270,12 @@ write_tsv(gistic_df,

#### Filter SNV consensus maf data ---------------------------------------------

snv_maf_df <- snv_maf_df %>%
snv_consensus_hotspot_maf <- snv_consensus_hotspot_maf %>%
left_join(select_metadata,
by = c("Tumor_Sample_Barcode" = "Kids_First_Biospecimen_ID")) %>%
filter(Tumor_Sample_Barcode %in% hgg_metadata_df$Kids_First_Biospecimen_ID) %>%
arrange(Kids_First_Participant_ID, sample_id)

# Write to file
write_tsv(snv_maf_df,
write_tsv(snv_consensus_hotspot_maf,
file.path(subset_dir, "hgg_snv_maf.tsv.gz"))
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,9 @@ We likely want to be permissive with the _TERT_ mutations in terms of *regions.*
```{r}
tert_snv_df <- filtered_snv_df %>%
dplyr::filter(Hugo_Symbol == "TERT",
Variant_Classification != "Silent")
Variant_Classification == "5'Flank",
Start_Position %in% c("1295113","1295135"),
End_Position %in% c("1295113","1295135"))
tert_snv_df
```
Expand Down
Loading

0 comments on commit c074457

Please sign in to comment.