Skip to content

Retrieving raw sequence data from the SRA

Thomas edited this page May 31, 2024 · 2 revisions

While genomic data is routinely made publicly available, the steps taken to process the data may be unclear. If you wish to use data from a publication (e.g. ChIP-seq peak files or processed BAM files), it might actually be preferable to access the raw sequencing reads and process them yourself. This page includes instructions for retrieving raw sequencing reads from the Sequence Read Archive (SRA)

  1. Create an AWS or Google Cloud bucket. These services sometimes offer new users free credits. For one-off use of the SRA, these free credits are more than sufficient. However, if your project involves retrieving large quantities of data from the SRA, it may be worth setting up a lab bucket.

  2. Find the SRA BioProject accession associated with the data that you want to download. In this example, we are using the BioProject accession PRJNA779107 from Bartlett et al. (2021). Once you have the correct accession, go to the SRA home page and type the accession into the search bar.

image
  1. Any given project will contain raw sequence files from one or more experiments. For example, the Bartlett et al. (2021) study generated TIP-seq data targeting CTCF and H3K27me3 at multiple cell inputs. Click on the entries of the "Run" column to get more details about each experiment (e.g. you might be interested in the number of reads generated by a particular experiment). Select the appropriate rows when you have identified the data you want to download.
image
  1. Toggle "Selected" and hit "Deliver Data". You will be prompted to log in to your NCBI account. If you don't have an account, you will be given the opportunity to create one.

  2. Upon successful login, you will be taken to the below page. Follow the instructions, entering the name of your AWS or GCP bucket, etc. Hit "Deliver data" at the bottom of the page, and the request should be fulfilled within 48 hours.

image

Bartlett DA et al. High-throughput single-cell epigenomic profiling by targeted insertion of promoters (TIP-seq). J Cell Bio. 2021;220(12):220. doi: 10.1083/jcb.202103078

Clone this wiki locally