FastTools: Python module with tools useful for reading and manipulating FASTQ/FASTA files.

Currently only supporting Python 3.6

FastTools is a module written with the intention of making working with FASTQ/FASTA files more convenient for Python users. This module gives the user high-level control over their NGS data, and uses a Data Science orientation to work with sequence data.

Setup

Clone or download FastTools repository.
Create fastTools environment.
- Anaconda Python Distrbution:
  a. Set up bioconda by following instructions at https://bioconda.github.io/
  b. Use command:
  - For Mac: conda env create -f environment_osx.yml
  - For Win: conda env create -f environment_win.yml
```
  * Tentatively, this should work for OSX and Windows.

  * Alternatively, you could create your own environment, and download libraries manually.  
      "conda install seaborn" will install pandas, matplotlib, and seaborn for you.  
      "conda install biopython" should fulfill FastTools' requirements.
```
  c. If adding to an existing environment, activate your environment then use command:
  conda env update -f=environment_<your_os>.yml
- Using pip and venv:
  a. python3 -m venv env
  b. source env/bin/activate
  c. pip install -r pip_requirements.txt
Include fastTools in your project directory alongside your own modules or scripts.
Import module.
- Place "import fastTools" at the top of the script you want to use it in.

Module Attributes

qScoreDict: Dictionary that maps Illumina QScore symbols to their integer values.

Usage
- myQualityDict = fastTools.qScoreDict
- fastTools.qScoreDict['?']
  - Returns 30

FastqFile class

Usage

Initialization

myfile = fastTools.FastqFile('Sample1_S1_L001_R1_001.fastq.gz')
- Will create interleaved FastqFile object using R1 and R2 files.
myfile = fastTools.FastqFile('Sample1_S1_L001_R2_001.fastq.gz', False)
- Will create a FastqFile from only the file name passed.

Class attributes

self.fastq1: Name of first FASTQ file passed during initialization.

self.fastq2: Name of second FASTQ file if passed during initialization. Else, returns "None".

self.sample: Truncated name of self.fastq1 file, convenient for labelling.

self.paired: True if R1 and R2 files were read and combined; False if only R1 or R2 file used.

self.fastqDataFrame: Pandas DataFrame object that holds all read/calculated data for the FastqFile object.

Example:

myfile.fastq1
- Returns 'Sample1_S1_L001_R1_001.fastq.gz'
myfile.fastq2
- Returns 'None' if a second file was not passed
myfile.sample
- Returns 'Sample1_S1'

Class methods

These methods create a new column in self.fastqDataFrame that contains calculated data.

self.numReads(): Returns number of reads in self.fastqDataFrame.

self.averageQuality()

self.reverseComplement()

self.aminoAcid()

self.calculateGC()

These methods create plots that can either be displayed or saved.

self.plotAverageQuality(outfile=False)

self.plotGCcontent(outfile=False)

This method saves your FastqFile object as a .fastq.gz file in the current directory.

self.writeFASTQ(outfile)

FastaFile class

Usage

Initialization

myfile = fastTools.FastaFile('Sample1.fasta')

Class attributes

self.fasta: Name of FASTQ file passed during initialization.

self.fastaDataFrame: Pandas DataFrame object that holds all read/calculated data for the FastqFile object.

Example:

myfile.fasta
- Returns 'Sample1.fasta'

Class methods

These methods create a new column in self.fastqDataFrame that contains calculated data.

self.numReads(): Number of reads in self.fastaDataFrame.

self.reverseComplement()

self.aminoAcid()

self.calculateGC()

This method creates a plot that can either be displayed or saved.

self.plotGCcontent(outfile=False)

This method saves your FastaFile object as a .fasta file in the current directory.

self.writeFASTA(outfile)

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.gitignore		.gitignore
README.md		README.md
environment_osx.yml		environment_osx.yml
environment_win.yml		environment_win.yml
fastTools.py		fastTools.py
pip_requirements.txt		pip_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastTools: Python module with tools useful for reading and manipulating FASTQ/FASTA files.

Currently only supporting Python 3.6

Setup

Module Attributes

FastqFile class

Usage

Initialization

Class attributes

Example:

Class methods

FastaFile class

Usage

Initialization

Class attributes

Example:

Class methods

About

Releases

Packages

Contributors 2

Languages

dgellerup/FastTools

Folders and files

Latest commit

History

Repository files navigation

FastTools: Python module with tools useful for reading and manipulating FASTQ/FASTA files.

Currently only supporting Python 3.6

Setup

Module Attributes

FastqFile class

Usage

Initialization

Class attributes

Example:

Class methods

FastaFile class

Usage

Initialization

Class attributes

Example:

Class methods

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages