FastTools is a module written with the intention of making working with FASTQ/FASTA files more convenient for Python users. This module gives the user high-level control over their NGS data, and uses a Data Science orientation to work with sequence data.
-
Clone or download FastTools repository.
-
Create fastTools environment.
-
Anaconda Python Distrbution:
a. Set up bioconda by following instructions at https://bioconda.github.io/
b. Use command:
- For Mac: conda env create -f environment_osx.yml
- For Win: conda env create -f environment_win.yml* Tentatively, this should work for OSX and Windows. * Alternatively, you could create your own environment, and download libraries manually. "conda install seaborn" will install pandas, matplotlib, and seaborn for you. "conda install biopython" should fulfill FastTools' requirements.
c. If adding to an existing environment, activate your environment then use command:
conda env update -f=environment_<your_os>.yml -
Using pip and venv:
a. python3 -m venv env
b. source env/bin/activate
c. pip install -r pip_requirements.txt
-
-
Include fastTools in your project directory alongside your own modules or scripts.
-
Import module.
- Place "import fastTools" at the top of the script you want to use it in.
qScoreDict
: Dictionary that maps Illumina QScore symbols to their integer values.
- Usage
myQualityDict = fastTools.qScoreDict
fastTools.qScoreDict['?']
- Returns 30
-
myfile = fastTools.FastqFile('Sample1_S1_L001_R1_001.fastq.gz')
- Will create interleaved FastqFile object using R1 and R2 files.
-
myfile = fastTools.FastqFile('Sample1_S1_L001_R2_001.fastq.gz', False)
- Will create a FastqFile from only the file name passed.
self.fastq1
: Name of first FASTQ file passed during initialization.
self.fastq2
: Name of second FASTQ file if passed during initialization. Else, returns "None".
self.sample
: Truncated name of self.fastq1 file, convenient for labelling.
self.paired
: True if R1 and R2 files were read and combined; False if only R1 or R2 file used.
self.fastqDataFrame
: Pandas DataFrame object that holds all read/calculated data for the FastqFile object.
myfile.fastq1
- Returns 'Sample1_S1_L001_R1_001.fastq.gz'
myfile.fastq2
- Returns 'None' if a second file was not passed
myfile.sample
- Returns 'Sample1_S1'
These methods create a new column in self.fastqDataFrame that contains calculated data.
self.numReads()
: Returns number of reads in self.fastqDataFrame.
self.averageQuality()
self.reverseComplement()
self.aminoAcid()
self.calculateGC()
These methods create plots that can either be displayed or saved.
self.plotAverageQuality(outfile=False)
self.plotGCcontent(outfile=False)
This method saves your FastqFile object as a .fastq.gz file in the current directory.
self.writeFASTQ(outfile)
- myfile = fastTools.FastaFile('Sample1.fasta')
self.fasta
: Name of FASTQ file passed during initialization.
self.fastaDataFrame
: Pandas DataFrame object that holds all read/calculated data for the FastqFile object.
myfile.fasta
- Returns 'Sample1.fasta'
These methods create a new column in self.fastqDataFrame that contains calculated data.
self.numReads()
: Number of reads in self.fastaDataFrame.
self.reverseComplement()
self.aminoAcid()
self.calculateGC()
This method creates a plot that can either be displayed or saved.
self.plotGCcontent(outfile=False)
This method saves your FastaFile object as a .fasta file in the current directory.
self.writeFASTA(outfile)