Lets remove adapters (trim) and low quality reads (filter) with the program cutadapt.
ssh <username>@stampede.tacc.utexas.edu
cd $SCRATCH/JA16444/00_rawdata
The cutadapt program is not maintained by TACC. We need to use the program stored in a colleagues working directory. To access the program, set the path.
PATH=/work/01184/daras/bin/cutadapt-1.3/bin:$PATH
This loop will create a command to process the paired-end reads.
rm 02_filtrimmedreads.cmds # precaution just incase you've done this bevore
for R1 in *R1_001.fastq.gz
do
R2=$(basename $R1 R1_001.fastq.gz)R2_001.fastq.gz
R1filtrim=$(basename $R1 fastq.gz)filtrim.fastq.gz
R2filtrim=$(basename $R2 fastq.gz)filtrim.fastq.gz
echo $R1 $R2 $R1filtrim $R2filtrim
echo "cutadapt -q 15,10 -a GATCGGAAGAGCACACGTCTGAACTCCA -A ATCGTCGGACTGTAGAACTCTGAACGTG -m 22 -o $R1filtrim -p $R2filtrim $R1 $R2" >> 02_filtrimmedreads.cmds
done
Create a launcher script and launch the job
launcher_creator.py -t 4:00:00 -n 02_filtrimmedreads -j 02_filtrimmedreads.cmds -l 02_filtrimmedreads.slurm -A NeuroEthoEvoDevo -q 'normal'
sbatch 02_filtrimmedreads.slurm
Request compute time, makde cmd file executable, run commands.
idev -m 120
chmod a+x 02_filtrimmedreads.cmds
bash 02_filtrimmedreads.cmds
Now, let's make our processed reads read only so we don't accidentally modify them.
chmod a-w *filtrim.fastq.gz
Now, move the processed reads to a new file.
mkdir ../02_filtrimmedreads
mv *filtrim.fastq.gz ../02_filtrimmedreads
- Cutadapt: http://cutadapt.readthedocs.io/en/stable/guide.html
- BioITeam Launcher Creator: https://wikis.utexas.edu/display/bioiteam/launcher_creator.py