-
Notifications
You must be signed in to change notification settings - Fork 1
Quality control of high throughput sequence data
- The raw, compressed FASTQ files you'll be using in today's exercises can be found in
/db/Teaching/Assembly/Sulfitobacter/raw. - Make sure you have an account on the computer cluster Albiorix.
Usernames and passwords will be passed out before we begin. - Run the exercises in two terminal tabs - one where you log into Albiorix and one for accessing the files on your local computer.
- After each analysis has run, take a look at the output produced, to be sure that the process hasn't encountered any unexpected errors.
- Files to check include log files in the analysis result subfolders, and any files ending in
.sge.e#####and.sge.o#####.
- Files to check include log files in the analysis result subfolders, and any files ending in
- Start by logging in to Albiorix (note - replace
studentXwith the username you have been given).
ssh -Y studentX@albiorix.bioenv.gu.se
- When you first log in you will be located in your home directory. However, for this exercise you should move to a different part of the file system.
cd /nobackup/data18/Assembly_exercise/studentX
For this part of the exercise, you will be using the files runFastQC.sge and runCutadapt.sge. These are templates for scripts that can be sent to the queue system in order to run FastQC and Cutadapt, once you have filled in the relevant commands.
-
Inspect the first few lines of the FASTQ files with
zless.- Does the format of the files look as you expect?
-
Run FastQC on your FASTQ files, by adding the relevant command to the file
runFastQC.sge.-
Hint - the basic form of the command, found at
fastqc --help, should be sufficient. - Make sure you include the option
-t $NSLOTS, to ensure you use all of your allocated resources for the task. - Once you're happy with the command, submit it to the queue with
qsub runFastQC.sge.
-
Hint - the basic form of the command, found at
-
Explore the FastQC report (
firefox *.html) and note your observations. For example:
a. Does the per base sequence quality look acceptable?
b. Does the per base sequence content suggest any bias?
c. Do any adapters need removing?
If Firefox is slow, rsync the files to your local computer and view them there.
- Discuss your observations with another student or an instructor.
- Run Cutadapt based on your observations, by adding the relevant command to the file
runCutadapt.sge.-
Hint - run
cutadapt --helpto see how the command should be formatted. - Make sure you include the option
-j $NSLOTS, to ensure you use all of your allocated resources for the task. - Once you're happy with the command, submit it to the queue with
qsub runCutadapt.sge.
-
Hint - run
The following commands may be helpful:
-
-qsets the quality threshold for base trimming. - The following commands affect only the forward (R1) reads. Use the Cutadapt help page to identify the equivalent reverse (R2) commands you need.
-
-udefines the number of bases to hard trim from the 5' end of the forward reads. -
-adefines the sequence of the 3' adapter to remove from forward reads.- Hint - FastQC's Github page has a list of the adapter sequences that FastQC looks for.
-
-odefines the output file names for trimmed forward reads.
-
-
Re-run FastQC on the newly-trimmed files.
-
Explore the FastQC report, and check whether the data looks good compared to the first report.
- Hint - how many reads have been lost during quality control?
Are you happy with how your data looks?
- No - Try rerunning Cutadapt with some different parameters, generate some new FastQC files for the results, and see whether this looks any better.
- Yes - Excellent! You can either continue directly to the Assembly section of the exercise, or try out the alternative QC approach below.
As mentioned in the lecture, Trim Galore! is a wrapper script containing both Cutadapt and FastQC. While it does have some limitations in how it allows you to interact with these programs, you may find it better to work with than Cutadapt and FastQC separately - when multiple tools are available, it's worth trying them out to see which works best for you!
S1. Re-run the trimming using trim_galore, by adding the relevant command to the file runTrimGalore.sge.
-
Hint - run
trim_galore --help - Make sure you include the option
-j $NSLOTS, to ensure you use all of your allocated resources for the task. - Once you're happy with the command, submit it to the queue with
qsub runTrimGalore.sge. - How do the results compare to running Cutadapt and FastQC separately?