Table of contents
Project description
Using yeast as a model organism, this project aims to understand how the transcriptome changes during the entry into quiescence, a reversible non-replicative state. In comparison to the transcriptomes of cycling yeast, a significant portion of the quiescent transcriptome is dedicated to noncoding transcription. To explore this noncoding transcription, we used data from 4tU-seq, an NGS assay that enables the isolation of quick-decaying nascent transcription, for transcript assembly and annotation in quiescent yeast.
In addition to nascent transcription, 4tU-seq allows us to isolate steady-state transcripts. During our transcriptome analyses, we found distinct differences between the steady-state and nascent transcriptomes in quiescence, particularly in their compositions. These observation led us to generate and analyze 4tU-seq data from deletion and depletion models for post-transcriptional regulators. The scripts and notebooks in this repository detail the analysis portion of that work, including:
- Pre-processing, aligning, and post-processing sequenced paired-end reads
- Drafting nascent transcriptome assemblies for cells in the quiescent (Q) and G1 states
- Filtering and annotating the draft assemblies
- Conducting various bioinformatics and statistical analyses and visualizing the results
Dependencies
#TODO
Directory structure
2022-2023_RRP6-NAB3
βββ bin
βββ data
βββ results
βββ 2023-0111 # Transcriptome assembly
β βββ tutorial_troubleshooting
β βββ work_initial
βββ 2023-0115 # Pre-processing, aligning, and post-processing sequenced paired-end reads
β βββ etc_QC
β βββ etc_cleaning
β βββ etc_initial
β βββ notebook
β βββ test_tutorial
βββ 2023-0215 # Everything else
βββ GEO
βββ bws
βββ infiles_gtf-gff3
β βββ Trinity-GG
β β βββ G_N
β β βββ Q_N
β βββ already
β βββ comprehensive
β β βββ S288C_reference_genome_R64-1-1_20110203
β βββ representation
β βββ CUTs-HMM_CUTs-4X
β βββ CUTs_SUTs
β βββ NUTs
β βββ SRATs
β βββ XUTs
β βββ ncRNAs
βββ notebook
βββ outfiles_gtf-gff3
β βββ Trinity-GG
β β βββ G_N
β β β βββ err_out
β β β βββ filtered
β β β βββ CDS
β β β βββ exon
β β β βββ introns_filtered
β β β βββ locus
β β β βββ mRNA
β β βββ Q_N
β β βββ err_out
β β βββ filtered
β β βββ CDS
β β βββ exon
β β βββ introns_filtered
β β βββ locus
β β βββ mRNA
β βββ already
β β βββ sgd-related
β βββ comprehensive
β β βββ S288C_reference_genome_R64-1-1_20110203
β βββ representation
βββ outfiles_htseq-count
βββ Trinity-GG
β βββ G_N
β β βββ err_out
β β βββ filtered
β β β βββ locus
β β β βββ err_out
β β βββ list
β β βββ sh
β βββ Q_N
β βββ err_out
β βββ filtered
β β βββ locus
β β βββ err_out
β βββ list
β βββ sh
βββ already
β βββ combined-AG
β β βββ CUT
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ CUT_2016
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ CUT_4X
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ NUTs
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ SRAT
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ SUT
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ XUT
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ antisense_transcript
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ mRNA
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ ncRNA
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ rRNA
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ snRNA
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ snoRNA
β β β βββ UTK_prim_UMI
β β β β βββ err_out
β β β βββ UT_prim_UMI
β β β βββ err_out
β β βββ tRNA
β β βββ UTK_prim_UMI
β β β βββ err_out
β β βββ UT_prim_UMI
β β βββ err_out
β βββ combined-SC-KL-20S
β βββ UTK_prim_UMI
β β βββ err_out
β βββ UTK_prim_no
β β βββ err_out
β βββ UTK_prim_pos
β β βββ err_out
β βββ UT_prim_UMI
β β βββ err_out
β βββ UT_prim_no
β β βββ err_out
β βββ UT_prim_pos
β βββ err_out
βββ representation
βββ UT_prim_UMI
βββ err_out
2023-0111
work_GMAP_rough-draft.md
- Has initial description
- Makes and works with paths within directory
2023-0111
work_Trinity-GF-GG-optimization_submit-jobs.md
- Has initial description
- Has absolute paths in
Code
- Has absolute paths in
Printed
README.md
#TODO
Needs to be rewritten to explain what is run with what and in what order
tutorial_troubleshooting
work_initial
#TODO #MAYBE
Delete contents and subdirectory
2023-0115
README.md
#TODO
Needs to be rewritten to explain what is run with what and in what orderwork_MultiQC.md
- Has absolute and relative paths in
Code
andPrinted
- Has absolute and relative paths in
work_process-data_4tU-seq_fastqs-UMI.md
andwork_process-data_4tU-seq_fastqs-UMI-dedup_new-experiments-March.md
#TODO
Rename the scripts#TODO
Add variables from below to top initialization, but don't bother refactoring the parallel calls toHEREDOC
submissions
work_env-building.md
#TODO
Include info in here in new notebook and/or yamls assoc. w/Dependencies
#TODO
Add top-of-file descriptions to all notebooks that are made available
2023-0215
infiles_gtf-gff3
#TODO
Some, not allβdetermine what to make available and what to notoutfiles_gtf-gff3
#TODO
Some, not allβdetermine what to make available and what to notoutfiles_htseq-count
#TODO
Some, not allβdetermine what to make available and what to not
- README.md
#TODO
Needs to be rewritten to explain what is run with what and in what order - rough-draft_coverage-tracks_timecourse_size-effect.sh
- rough-draft_coverage-tracks.sh
- rough-draft_draw_scatter-plots.R
- rough-draft_evaluate-categories_expression_initial.Rmd
- rough-draft_evaluate-categories_expression.R
- rough-draft_new-approach-to-analyses.R
- rough-draft_plot-distributions_expression.R
- rough-draft_plot-distributions_length.R
- rough-draft_plot-TPM_N-varies-on-SS_LFC-varies-on-TPM.R
- rough-draft_run-analyses_Fig-5B-5C.R
- rough-draft_run-analyses_rlog-PCA_write-rds.R
- rough-draft_write-gtf-blacklist.R
#MAYBE
- run_chi-sq_quantile-filtered-coding-assignments.R
#MAYBE
Need to check on thisβassociated with data-for-chi-sq.xlsx below? - work_assess-process_R64-1-1-gff3_categorize-Trinity-transfrags_part-0.Rmd
#COMBINE?
- work_assess-process_R64-1-1-gff3_categorize-Trinity-transfrags_part-1.Rmd
#COMBINE?
- work_assess-process_R64-1-1-gff3_categorize-Trinity-transfrags_part-2_legend.txt
#COMBINE?
- work_assess-process_R64-1-1-gff3_categorize-Trinity-transfrags_part-2.R
#COMBINE?
- work_assess-process_R64-1-1-gff3_categorize-Trinity-transfrags_part-3.R
#COMBINE?
- work_assessment-processing_gtfs_part-0_Trinity-etc.md
#COMBINE?
- work_assessment-processing_gtfs_part-0.5_non-Trinity.Rmd
#COMBINE?
- work_assessment-processing_gtfs_part-1_Trinity.Rmd
#COMBINE?
- work_assessment-processing_gtfs_part-1.5_Trinity.R
#COMBINE?
- work_assessment-processing_gtfs_part-2_Trinity.md
#COMBINE?
- work_calculate_uni-multimappers-etc.md
#COMBINE?
- work_combine-gtfs_processed-ncRNA_part-0.Rmd
#COMBINE?
- work_combine-gtfs_processed-ncRNA_part-1.md
#COMBINE?
- work_combine-gtfs_processed-non-pa-ncRNA_part-0.Rmd
#COMBINE?
- work_combine-gtfs_processed-non-pa-ncRNA_part-1.md
#COMBINE?
- work_combine-gtfs_processed-pa-ncRNA_part-0.Rmd
#COMBINE?
- work_combine-gtfs_processed-pa-ncRNA_part-1.md
#COMBINE?
- work_count-features_assessed-processed-R64-1-1-gff3s.md
#MAYBE
- work_gff3_include-20S.md
- work_make-blacklist-etc.py
#TODO
This needs to be rewritten for just the NotFeature fileβcan remain in Python, or may switch to R (whatever is faster) - work_prepare-data_GEO_matrices.R
- work_prepare-data_GEO.md
- work_preprocess-prepare_htseq-counts-matrices_gtf-gff3_etc.Rmd
#MAYBE
Create a directory for initial work and just keep it there w/o making it public? Not sure... - work_representative-non-coding-transcriptome_part-0.md
#COMBINE?
- work_representative-non-coding-transcriptome_part-1.md
#COMBINE?
- work_representative-non-coding-transcriptome_part-2.Rmd
#COMBINE?
- work_representative-non-coding-transcriptome_part-3.md
#COMBINE?
- work_representative-non-coding-transcriptome_part-4.Rmd
#COMBINE?
- work_representative-non-coding-transcriptome_part-5.md
#COMBINE?
data_timecourse_counts-raw.tsv#MAYBE
Delete?data_timecourse_counts-rlog.tsv#TODO
Deletedata-for-chi-sq.xlsx#TODO
Delete- rough-draft_estimate-RNA-degredation.R
rough-draft_evaluate-categories_expression_scraps_initial.Rmd#MAYBE
Delete?rough-draft_new-approach-to-analyses_tests-scraps.R#TODO
Deleterough-draft_plot-TPM_N-varies-on-SS.scraps.R#TODO
Deleterough-draft_run-analyses_rlog-PCA_write-rds.notes-2.txt#TODO
Deleterough-draft_run-analyses_rlog-PCA_write-rds.notes.txt#TODO
Deleterough-draft_run-analyses_rlog-PCA_write-rds.scraps.R#TODO
Delete- tutorial_collapse-intersecting-regions.R
- tutorial_extract-non-overlapping-regions.R
- work_env-building.md
#TODO
Include info in here in new notebook and/or yamls assoc. w/Dependencies - test_count_features.md
#MAYBE
Delete it? Or maybe create a directory for initial work and just keep it there w/o making it public? - work_count_features_featureCounts.md
#MAYBE
Delete it? Or create a directory for initial work and just keep it there w/o making it public? - work_count_features_htseq-count.md
#TODO
Create a directory for initial work and just keep it there w/o making it public? (This is the work in prep for AG's FHCC seminar.) - work_evaluation-etc_rough-draft_Rrp6-WT_SS_timecourse_groupwise.Rmd
#TODO
Create a directory for initial work and just keep it there w/o making it public? (This is the work in prep for AG's FHCC seminar.) - work_evaluation-etc_variables_pairwise-groupwise.Rmd
#TODO
Delete - work_evaluation-etc_variables_pairwise-groupwise.tmp-gw.R
#TODO
Delete - work_evaluation-etc_variables_pairwise-groupwise.tmp-pw.R
#TODO
Delete - work_evaluation-etc_variables_pairwise-groupwise.TODOs-scraps-etc.txt
- work_examine-snRNA-snoRNA-annotations_part-1.Rmd
#TODO
Create a directory for initial work and just keep it there w/o making it public?#ORKEEP?
- work_examine-snRNA-snoRNA-annotations_part-2.md
#TODO
Create a directory for initial work and just keep it there w/o making it public?#ORKEEP?
- work_gff3_convert-strand-designations.Rmd
#TODO
Create a directory for initial work and just keep it there w/o making it public? - work_model-variables.md
#NOTE
Don't need to actually make this file available, but#TODO
should include the information contained in this file in either the main or a sub README.md - work_normalization-etc_rough-draft_NNS_vary-on-transcription.Rmd
#TODO
Create a directory for initial work and just keep it there w/o making it public - work_normalization-etc_rough-draft_OsTIR-NNS_vary-on-strain.Rmd
#TODO
Create a directory for initial work and just keep it there w/o making it public - work_normalization-etc_rough-draft_wild-type_vary-on-state_antisense.Rmd
#TODO
Create a directory for initial work and just keep it there w/o making it public - work_normalization-etc_rough-draft_wild-type_vary-on-state.Rmd
#TODO
Create a directory for initial work and just keep it there w/o making it public
- collate_sea-tsv.sh
#NOTE
AG used this for her bootstrapping thing - rough-draft_handle-matrices-gtfs.R
#NOTE
Do we use this for anything in the figures? - rough-draft_handle-tables_establish-study-design.R
#NOTE
Do we use this for anything in the figures? - rough-draft_run-analyses_GO.R
#NOTE
Not sure if we actually use this - rough-draft_timecourse-samples_processing_part-1a.R
#TODO
Need to check on this... - rough-draft_timecourse-samples_processing_part-1b.R
#TODO
Need to check on this... - rough-draft_timecourse-samples_processing_part-1c.R
#TODO
Need to check on this... - rough-draft_timecourse-samples_processing_part-2a.R
#TODO
Need to check on this...
Copyright Β© 2022-2023 Kris Alavattam
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated files (the βSoftwareβ), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.