fix ps and exercises for classes 31 and 32

rnabioco · Oct 2, 2024 · 396f6ee · 396f6ee
1 parent 3bc226e
commit 396f6ee
Show file tree

Hide file tree

Showing 31 changed files with 97 additions and 25 deletions.
diff --git a/_freeze/exercises/ex-31/execute-results/html.json b/_freeze/exercises/ex-31/execute-results/html.json
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-25-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-25-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-26-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-26-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-27-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-27-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-28-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-28-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-29-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-29-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-31-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-31-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-36-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-36-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-37-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-37-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-40-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-40-1.png
diff --git a/_freeze/exercises/ex-31/figure-html/unnamed-chunk-41-1.png b/_freeze/exercises/ex-31/figure-html/unnamed-chunk-41-1.png
diff --git a/_freeze/exercises/ex-32/execute-results/html.json b/_freeze/exercises/ex-32/execute-results/html.json
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-10-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-10-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-11-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-11-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-14-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-14-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-14-2.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-14-2.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-15-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-15-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-15-2.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-15-2.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-19-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-19-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-20-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-20-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-21-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-21-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-24-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-24-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-26-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-26-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-29-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-29-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-31-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-31-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-7-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-7-1.png
diff --git a/_freeze/exercises/ex-32/figure-html/unnamed-chunk-9-1.png b/_freeze/exercises/ex-32/figure-html/unnamed-chunk-9-1.png
diff --git a/_freeze/problem-sets/ps-31/execute-results/html.json b/_freeze/problem-sets/ps-31/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+  "hash": "52a3e690db32f12dbbcb6e9527876998",
+  "result": {
+    "engine": "knitr",
+    "markdown": "---\ntitle: \"Single-cell RNA-seq Problem Set\"\nauthor: \"Yor name here\"\n---\n\n\n\nGrade (out of 20):\n\n\nFor this problem set we will be reanalyzing some public single cell RNA-seq data ([publication](https://doi.org/10.1038/s41591-018-0233-1)). The dataset contains PBMCs from a patient with Acute Myeloid Leukemia (AML). The data we will be analyzing consists of two samples, one taken 2 days (day2) after treatment with a chemotherapeutic (Venetoclax and Azacitidine) or one taken prior to treatment (day0). \n\nThe datasets have been processed with `alevin`, and the output is here `aml/alevin/`. \n\nA common strategy to analyze multiple samples is to combine them into a single matrix prior to downstream processing. This simplifies the data analysis because the clustering values and PCA/UMAP coordinates are comparable between the samples. \n\nEach question is worth 2 points\n\n1: Load the matrices for each sample into R separately using `tximport`. How many cells are in each sample?\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#tximport(...) \n#tximport(...)\n```\n:::\n\n\n\n2: Rename the cell barcodes for each sample, appending a sample identifier to the cell barcode. This ensures that the cell barcodes are unique for each sample. Print the first 5 renamed cell barcodes from each sample.\n\nA common approach is to add a sample identifier as a prefix to the cell barcode, e.g.:\n\n`sample1_ATCGTAGCTAGTG`\n`sample2_GTCGATGCTGATG`\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# assume that d0_mat is the day 0 count matrix\n# assume that d2_mat is the day 2 count matrix\n\n#colnames(d0_mat) <- paste0(\"sample_identifier_\", colnames(d0_mat))\n#...              <- paste0(\"another_identifier_\", colnames(d2_mat))\n```\n:::\n\n\n\n\n3: Combine the two count matrices into 1. See ?cbind for help.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#...\n```\n:::\n\n\n\n4: Create a SingleCellExperiment object from this new combined matrix. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#sce <- SingleCellExperiment(list(counts = ...)) # Fill in the ...\n```\n:::\n\n\n\n\n5: Assign a new column into the colData that indicates the sample treatment day (e.g. day0 or day2). *Hint: functions from stringr may be helpful here*\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#sce$day <- ...\n```\n:::\n\n\n\n\n6: Next we will want to convert the ensembl gene ids into something more interpretable, such as gene symblols. Obtain gene symbols from ensembDb and store these in the rowData().  Make the gene symbols unique and assign them to the rownames of the SingleCellExperiment. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#library(AnnotationHub)\n#ens_db <- ah[[\"AH113665\"]]\n#... <- mapIds(...)\n\n#rowData(sce) <- ...\n```\n:::\n\n\n\n7: Next calculate the % of UMIs that are derived from mitochondrial genes and store it in the colData. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Remember the pattern for human mitochondrial genes is \"^MT-\"\n...\n\n#sce <- addPerCellQCMetrics(...)\n```\n:::\n\n\n\n8: Plot this % of UMIs that are derived from mitochondrial genes against the # of UMIs per cell. Color each point by the sample treatment day.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#plotColData(sce, ..., ..., ...)\n```\n:::\n\n\n\n9: Plot the # of UMIs and # of genes detected as violin plots. Plot each sample as separate groups on the x-axis.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#plotColData(sce, .., ...)\n\n#plotColData(sce, ..., ...)\n```\n:::\n\n\n\n\n\n10: Based on these plots, select cutoffs to exclude low-quality cells and filter your SingleCellExperiment object. Provide an explanation of your reasons for selecting the cutoffs chosen and report the # of cells remaining in each sample after filtering. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#pass_qc <- sce$... < X & sce$... > Y ...\n#sce[, ...]\n```\n:::\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/_freeze/problem-sets/ps-32/execute-results/html.json b/_freeze/problem-sets/ps-32/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+  "hash": "9a2e8aefee29884d75e8818e7c44299b",
+  "result": {
+    "engine": "knitr",
+    "markdown": "---\ntitle: \"Single-cell RNA-seq Problem Set II\"\nauthor: \"Your name here\"\n---\n\n\n\nGrade (out of 20):\n\nFor this problem set we will be reanalyzing some public single cell RNA-seq data ([publication](https://doi.org/10.1038/s41591-018-0233-1)). The dataset contains PBMCs from a patient with Acute Myeloid Leukemia (AML). The dataset is a little different than the one we looked at in Mondays problem set, as we will also include a day 4 sample. The data we will be analyzing consists of three samples, PBMCs taken 4 days (day4), 2 days (day2), and prior to (day 0) treatment with a chemotherapeutic (Venetoclax and Azacitidine).\n\nThe three single cell RNA-seq datasets have already been preprocessed and QC'd. There is an `.rds` file (`data/aml/d0_d2_d4_filtered.rds`) provided that contains the combined samples in a single seurat object. We will use this seurat object for this homework. \n\nQ1 4 points) Read the `.rds` file containing the Seurat object into R using the `readRDS()` function. Use tab-completion to ensure that you are specifying the correct path to the object.    \n\n\n\n::: {.cell}\n\n:::\n\n\n\nQ2 4 points) Process the dataset to generate a UMAP projection. Plot your UMAP with each cell colored by the day of sample (e.g day 0, day 2 or day 4). Examine the meta.data to find the column that contains the day of sample information. Consult the simplified work-flow shown at the beginning of class on Monday for a default approach to this question (you don't need to worry about picking parameters here). \n  \n\n\n::: {.cell}\n\n```{.r .cell-code}\n#head(so@meta.data)\n```\n:::\n\n\n  \nQ3 4 points) Make a UMAP plot showing the clusters that you have generated. To make this plot more informative use the `split.by` argument set to the column with the day information. This will split the UMAP into three plots ( `day0`, `day2` and `day4`). Remember that to plot categorical data you need to use `UMAPPlot()` and for numeric data use `FeaturePlot()`.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# code here\n```\n:::\n\n\n\nQ4 2 points) Use the `clustifyr` package to annotate cell types using a reference dataset from `clustifyrdatahub`. Use the `ref_hema_microarray()` reference shown in class. Plot a heatmap (using `pheatmap`) of the correlation coefficients between your clusters and the cell types in the reference data. Note that you need to set `obj_out` = FALSE to return the correlation coefficients as a data.frame. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(clustifyr)\nlibrary(clustifyrdatahub)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: ExperimentHub\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'BiocGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:SeuratObject':\n\n    intersect\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:lubridate':\n\n    intersect, setdiff, union\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:dplyr':\n\n    combine, intersect, setdiff, union\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:stats':\n\n    IQR, mad, sd, var, xtabs\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n    anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n    Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,\n    tapply, union, unique, unsplit, which.max, which.min\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: AnnotationHub\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocFileCache\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: dbplyr\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'dbplyr'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:dplyr':\n\n    ident, sql\n```\n\n\n:::\n\n```{.r .cell-code}\nlibrary(pheatmap)\n# code here\n```\n:::\n\n\n\nQ5 2 points) Run clustifyr but this time assign the output of `clustify` to return a Seurat object. The cell classifications will be listed in the `type` column. Make a UMAP plot colored by the assigned cell types.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# code here\n```\n:::\n\n\n\nQ6 4 points) For each day (e.g. day0 day2 and day4) calculate the % of cells present in each cluster. Using ggplot, plot with the cell type on the x axis and the % on the y-axis. Then use a fill aesthetic to color by the day of each sample.  If your x-axis labels are all squished together consider rotating the label using the following pseudocode:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#ggplot... +\n#  theme(axis.text.x = element_text(angle = 90))\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Hint - calculate % of cells using tidyverse, remember so@meta.data gives a data frame\n# % of cells will be # of cells in the cluster / total cells for each day. Summarize can\n# help you get the total number of cells\n# ex\n#so@meta.data %>%\n#  dplyr::group_by(orig.ident, type) # Finish this to count the number of cells\n# code here\n```\n:::\n\n\n\nHow does the relative abundance change of each cell type change?\n\n    Short answer here...\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/problem-sets/ps-31.qmd b/problem-sets/ps-31.qmd
@@ -1,6 +1,6 @@
 ---
 title: "Single-cell RNA-seq Problem Set"
-author: "Yor name here"
+author: "Your name here"
 ---
 
 Grade (out of 20):
@@ -18,8 +18,8 @@ Each question is worth 2 points
 
 ```{r}
 #| eval: false
-tximport(...) 
-tximport(...)
+#tximport(...) 
+#tximport(...)
 ```
 
 2: Rename the cell barcodes for each sample, appending a sample identifier to the cell barcode. This ensures that the cell barcodes are unique for each sample. Print the first 5 renamed cell barcodes from each sample.
@@ -34,43 +34,43 @@ A common approach is to add a sample identifier as a prefix to the cell barcode,
 # assume that d0_mat is the day 0 count matrix
 # assume that d2_mat is the day 2 count matrix
 
-colnames(d0_mat) <- paste0("sample_identifier_", colnames(d0_mat))
-...              <- paste0("another_identifier_", colnames(d2_mat))
+#colnames(d0_mat) <- paste0("sample_identifier_", colnames(d0_mat))
+#...              <- paste0("another_identifier_", colnames(d2_mat))
 ```
 
 
 3: Combine the two count matrices into 1. See ?cbind for help.
 
 ```{r}
 #| eval: false
-...
+#...
 ```
 
 4: Create a SingleCellExperiment object from this new combined matrix. 
 
 ```{r}
 #| eval: false
-sce <- SingleCellExperiment(list(counts = ...)) # Fill in the ...
+#sce <- SingleCellExperiment(list(counts = ...)) # Fill in the ...
 ```
 
 
 5: Assign a new column into the colData that indicates the sample treatment day (e.g. day0 or day2). *Hint: functions from stringr may be helpful here*
 
 ```{r}
 #| eval: false
-sce$day <- ...
+#sce$day <- ...
 ```
 
 
 6: Next we will want to convert the ensembl gene ids into something more interpretable, such as gene symblols. Obtain gene symbols from ensembDb and store these in the rowData().  Make the gene symbols unique and assign them to the rownames of the SingleCellExperiment. 
 
 ```{r}
 #| eval: false
-library(AnnotationHub)
-ens_db <- ah[["AH113665"]]
-... <- mapIds(...)
+#library(AnnotationHub)
+#ens_db <- ah[["AH113665"]]
+#... <- mapIds(...)
 
-rowData(sce) <- ...
+#rowData(sce) <- ...
 ```
 
 7: Next calculate the % of UMIs that are derived from mitochondrial genes and store it in the colData. 
@@ -81,24 +81,24 @@ rowData(sce) <- ...
 # Remember the pattern for human mitochondrial genes is "^MT-"
 ...
 
-sce <- addPerCellQCMetrics(...)
+#sce <- addPerCellQCMetrics(...)
 
 ```
 
 8: Plot this % of UMIs that are derived from mitochondrial genes against the # of UMIs per cell. Color each point by the sample treatment day.
 
 ```{r}
 #| eval: false
-plotColData(sce, ..., ..., ...)
+#plotColData(sce, ..., ..., ...)
 ```
 
 9: Plot the # of UMIs and # of genes detected as violin plots. Plot each sample as separate groups on the x-axis.
 
 ```{r}
 #| eval: false
-plotColData(sce, .., ...)
+#plotColData(sce, .., ...)
 
-plotColData(sce, ..., ...)
+#plotColData(sce, ..., ...)
 ```
 
 
@@ -107,7 +107,7 @@ plotColData(sce, ..., ...)
 
 ```{r}
 #| eval: false
-pass_qc <- sce$... < X & sce$... > Y ...
-sce[, ...]
+#pass_qc <- sce$... < X & sce$... > Y ...
+#sce[, ...]
 ```
 
diff --git a/problem-sets/ps-32.qmd b/problem-sets/ps-32.qmd
@@ -1,6 +1,6 @@
 ---
 title: "Single-cell RNA-seq Problem Set II"
-author: "Yor name here"
+author: "Your name here"
 ---
 
 Grade (out of 20):
@@ -14,13 +14,13 @@ Q1 4 points) Read the `.rds` file containing the Seurat object into R using the
 ```{r, echo = FALSE, message = FALSE, warning = FALSE}
 library(tidyverse)
 library(Seurat)
-so <- readRDS(...) # Fill in path to d0_d2_d4_filtered.rds
+#so <- readRDS(...) # Fill in path to d0_d2_d4_filtered.rds
 ```
 
 Q2 4 points) Process the dataset to generate a UMAP projection. Plot your UMAP with each cell colored by the day of sample (e.g day 0, day 2 or day 4). Examine the meta.data to find the column that contains the day of sample information. Consult the simplified work-flow shown at the beginning of class on Monday for a default approach to this question (you don't need to worry about picking parameters here). 
 
 ```{r}
-head(so@meta.data)
+#head(so@meta.data)
 ```
 
 Q3 4 points) Make a UMAP plot showing the clusters that you have generated. To make this plot more informative use the `split.by` argument set to the column with the day information. This will split the UMAP into three plots ( `day0`, `day2` and `day4`). Remember that to plot categorical data you need to use `UMAPPlot()` and for numeric data use `FeaturePlot()`.
@@ -48,17 +48,17 @@ Q6 4 points) For each day (e.g. day0 day2 and day4) calculate the % of cells pre
 
 ```{r}
 #| eval: false
-ggplot... +
-  theme(axis.text.x = element_text(angle = 90))
+#ggplot... +
+#  theme(axis.text.x = element_text(angle = 90))
 ```
 
 ```{r}
 # Hint - calculate % of cells using tidyverse, remember so@meta.data gives a data frame
 # % of cells will be # of cells in the cluster / total cells for each day. Summarize can
 # help you get the total number of cells
 # ex
-so@meta.data %>%
-  dplyr::group_by(orig.ident, type) # Finish this to count the number of cells
+#so@meta.data %>%
+#  dplyr::group_by(orig.ident, type) # Finish this to count the number of cells
 # code here
 ```