-
Notifications
You must be signed in to change notification settings - Fork 0
Retrieving raw sequence data from the SRA
While genomic data is routinely made publicly available, the steps taken to process the data may be unclear. If you wish to use data from a publication (e.g. ChIP-seq peak files or processed BAM files), it might actually be preferable to access the raw sequencing reads and process them yourself. This page includes instructions for retrieving raw sequencing reads from the Sequence Read Archive (SRA)
-
Create an AWS or Google Cloud bucket. These services sometimes offer new users free credits. For one-off use of the SRA, these free credits are more than sufficient. However, if your project involves retrieving large quantities of data from the SRA, it may be worth setting up a lab bucket.
-
Find the SRA BioProject accession associated with the data that you want to download. In this example, we are using the BioProject accession PRJNA779107 from Bartlett et al. (2021). Once you have the correct accession, go to the SRA home page and type the accession into the search bar.
- Any given project will contain raw sequence files from one or more experiments. For example, the Bartlett et al. (2021) study generated TIP-seq data targeting CTCF and H3K27me3 at multiple cell inputs. Click on the entries of the "Run" column to get more details about each experiment (e.g. you might be interested in the number of reads generated by a particular experiment). Select the appropriate rows when you have identified the data you want to download.
-
Toggle "Selected" and hit "Deliver Data". You will be prompted to log in to your NCBI account. If you don't have an account, you will be given the opportunity to create one.
-
Upon successful login, you will be taken to the below page. Follow the instructions, entering the name of your AWS or GCP bucket, etc. Hit "Deliver data" at the bottom of the page, and the request should be fulfilled within 48 hours.
Bartlett DA et al. High-throughput single-cell epigenomic profiling by targeted insertion of promoters (TIP-seq). J Cell Bio. 2021;220(12):220. doi: 10.1083/jcb.202103078
- Home
- Useful Info
- To do list for new starters
- Recommended Reading
-
Computing
- Our Private Cloud System
- Cloud Computing
- Docker
- Creating a Bioconductor package
- PBS example scripts for the Imperial HPC
- HPC Issues list
- Nextflow
- Analysing TIP-seq data with the nf-core/cutandrun pipeline
- Shared tools on Imperial HPC
- VSCode
- Working with Google Cloud Platform
- Retrieving raw sequence data from the SRA
- Submitting read data to the European Nucleotide Archive
- R markdown
- Lab software
- Genetics
- Reproducibility
- The Lab Website
- Experimental
- Lab resources
- Administrative stuff