This project was released under GPL-3.0 License.
- This project include some experiment coding for testing purpose.
- Use it at your own risk.
- I will try my best to complete and polish this documentation.
- Config First and Run It Later
Definition | Naming | Example |
---|---|---|
1. Pack process code into python class | lib*.py | |
2. Run *-setConfig.py first to configure dependent variables (Initialization) | *-setConfig.py | *-configExample.py |
3. Run related python class in *-run.py | *-run.py | *-exampleRun.py |
- Relay Race
Definition |
---|
1. Use script language to control the workflow |
2. Use executable binary or script language or both to process information |
- Run Directly
Definition |
---|
1. Declare sample-dependent variables on the beginning of script |
2. Derive downstream variables |
3. Run code directly without class |
- For general-purpose storage of configuration
- Coming soon
- For experiment design
- Coming soon
- For version management of executable
- Coming soon
- For adjustment
- Coming soon
- Coming soon
- Coming soon
List | Detail |
---|---|
Codename | 01-fq-fastqc |
Usage | Quality reports |
Type | ReRa |
Binary | FastQC 1 |
Language | Shell script & Python 3 |
Input | NGS reads, FASTQ format |
Output | HTML reports |
-
Command:
bin/FastQC/fastqc -f fastq -o [largeData/04-hisat2/species/speciesDatabase-trimQ30/report]
List | Detail |
---|---|
Codename | 02-hisat2-index |
Usage | Build HISAT2 index |
Type | CoFRIL |
Class | libHISAT.indexer() indexer |
Binary | hisat2-build from HISAT2 2 |
Input | Genome sequences, FASTA format |
Output | HISAT2 index of genome |
List | Detail |
---|---|
Codename | 03-trim |
Usage | Trim FASTQ files |
Type | CoFRIL |
Class | libTrim.trimmer() indexer |
Binary | Trimmomatic 3 |
Input | raw NGS reads, FASTQ format |
Output | Trimmed reads, FASTQ format |
List | Detail |
---|---|
Codename | 04-hisat2 |
Usage | Alignment and mapping |
Type | CoFRIL |
Class | libHISAT.aligner() aligner |
Binary | hisat2 from HISAT2 2 |
samtools from SAMtools 4 | |
Input | HISAT2 index of genome (02-hisat2-index) |
Trimmed reads, FASTQ format (03-trim) | |
Output | Alignments, BAM format |
List | Detail |
---|---|
Codename | 05-gr-gffRead |
Usage | Convert genome into transcriptome |
Type | CoFRIL |
Class | libCuffdiff.converter() converter |
Binary | gffread 5 |
Input | Genome annotation, GFF3 format |
Output | Transcriptome annotation, GTF2 format |
List | Detail |
---|---|
Codename | 06-fn |
Usage | Transcriptome extractor |
Type | ReRa |
Binary | gffread 5 |
Language | Shell script & Python 3 |
Input | Genome sequences, FASTA format |
Transcriptome annotation, GTF2 format (05-gr-gffRead) | |
Output | Transcripts, FASTA format |
Transcripts table, TSV format |
-
Command:
bin/cufflinks/gffread\ -g userData/dbgs-GenomeSequence/speciesDatabase/speciesDatabase.fn\ -w userData/06-gr-exportTranscript/speciesTreatment/speciesDatabase-trimQ30-transcript.fn\ userData/05-gr-transcriptomeConstruction/speciesTreatment/speciesDatabase-trimQ30-final.gtf
List | Detail |
---|---|
Codename | 07-cd-CuffDiff |
Usage | Estimate transcript abundances |
Type | CoFRIL |
Class | libCuffdiff.differ() differ |
Binary | cuffdiff from Cufflinks 6 |
Input | Alignments, BAM format (04-hisat2) |
Transcriptome annotation, GTF2 format (05-gr-gffRead) | |
Output | Abundances/Expression profile, DIFF format (TSV format) |
List | Detail |
---|---|
Codename | 08-an |
Usage | an1. Extract info of transcript-gene relation |
an2. Link transcript ID with gene ID and homologous ID | |
(cont.) Further annotate with functional annotation databases | |
Type | RuDi |
Language | Python 3 |
Input | Abundances/Expression profile, DIFF format (TSV format) (07-cd-CuffDiff) |
Genome annotation, JSON format (dbga-GenomeAnnotation) | |
Homolog , JSON format (dbga-GenomeAnnotation) | |
GO Terms, JSON format (dbgo-GOdatabase) | |
KEGG pathways, JSON format (dbkg-KEGG-hirTree) | |
Other databases... | |
Output | Homologous annotations, JSONs format (various files) |
List | Detail |
---|---|
Codename | 09-cd |
Usage | cd1. Convert the results of cuffdiff into SQLite3 form |
cd2. Annotate and seperate information into... | |
(cont.) sample-orientation Expression tables | |
Type | RuDi |
Language | Python 3 |
Input | Abundances/Expression profile, DIFF format (TSV format) (07-cd-CuffDiff) |
Homologous annotations, JSONs format (08-an) | |
Output | Annotated abundances/expression profile, SQLite3 format |
Annotated abundances/expression profile, TSV format |
List | Detail |
---|---|
Codename | 10-grouping |
Usage | Group and calculate the count of Differential Expressed Genes (DEGs) and Specifically Expressed Genes (SEGs) |
Type | RuDi |
Language | Python 3 |
Input | Annotated abundances/expression profile, TSV format (09-cd) |
Homologous annotations, JSONs format (08-an) | |
Output | DEA result files, JSON format |
DEA result files, TSV format | |
DEA count record, LOG format (TXT format) |
List | Detail |
---|---|
Codename | 11-ea-enrichAnaly |
Usage | Compare and calculate the ratio of count of DEGs or SEGs |
Type | RuDi |
Language | Python 3 |
Input | DEA result files, JSON format (10-grouping) |
Annotated abundances/expression profile, SQLite3 format (09-cd) | |
Homologous annotations, JSONs format (08-an) | |
Output | GSEA result files, JSON format |
GSEA result files, TSV format |
List | Detail |
---|---|
Codename | 12-ft |
Usage | ft1. Do Fisher's Exact Test |
ft2. Visualization | |
Type | RuDi |
Language | Python 3 |
Input | GSEA result files, JSON format (11-ea-enrichAnaly) |
Homologous annotations, JSONs format (08-an) | |
Output | FET result files, PNG format |
FET result files, SVG format | |
FET result files, TSV format |
- Original Command:
-
libHISAT.indexer() #Building HISAT2 Index
hisat2-build \` -p [THREAD] <Path and Name of GENOME File> \ < prefix of HISAT2-build genome index (path+header)>
-
-
#Aligning and mapping
hisat2 \ -q [--dta/--dta-cufflinks] --phred <phred> -p <thread> \ -x <prefix of HISAT2-build genome index> \ -1 <forward fastq files of samples> \ -2 <reverse fastq files of samples> \ -S <output SAM files> \
-
#Convert SAM to BAM
samtools view -o <out.bam> -Su <in.sam>
-
#Sorting BAM for decreasing file size
samtools sort -o <out-sorted.bam> <in.bam>
-
-
- Original Command:
- libTrim.trimmer()
-
#Pair-End
java -jar <bin>/trimmomatic-0.35.jar PE \ -phred33 -threads <threads> \ input_forward.fq.gz input_reverse.fq.gz \ output_forward_paired.fq.gz output_forward_unpaired.fq.gz \ output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \ <ILLUMINACLIP> <LEADING> \ <TRAILING> <SLIDINGWINDOW> <MINLEN>
-
Single-End
java -jar <bin>/trimmomatic-0.35.jar SE \ -phred33 -threads <threads> \ input.fq.gz output.fq.gz \ <ILLUMINACLIP> <LEADING> \ <TRAILING> <SLIDINGWINDOW> <MINLEN>
-
- libTrim.trimmer()
- Original Command:
-
cuffdiff \ -p <int> -o <string> \ -L <label1,label2,…,labelN> <transcripts.gtf> \ [[sample1_replicate1.sam,…] …… […,sampleN_replicateM.sam]]
-
bin/cufflinks/gffread <inputFile> -T -o <outputFile>
-
- [FastQC] https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- Andrews, S. (2018). FastQC: a quality control tool for high throughput sequence data (Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom).
- [HISAT2] https://doi.org/10.1038/nprot.2016.095 (Article)
- Pertea, M., Kim, D., Pertea, G.M., Leek, J.T., and Salzberg, S.L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11, 1650-1667.
- https://ccb.jhu.edu/software/hisat2/manual.shtml (Documentation & Binary)
- [Trimmomatic] https://doi.org/10.1093/bioinformatics/btu170 (Article)
- Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.
- http://www.usadellab.org/cms/?page=trimmomatic (Documentation & Binary)
- [SAMtools] https://www.htslib.org/doc/samtools.html
- [gffread] https://ccb.jhu.edu/software/stringtie/gff.shtml
- [Cuffdiff] https://doi.org/10.1038/nbt.2450 (Article)
- Trapnell, C., Hendrickson, D.G., Sauvageau, M., Goff, L., Rinn, J.L., and Pachter, L. (2013). Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31, 46-53.
- https://cole-trapnell-lab.github.io/cufflinks/manual/ (Documentation)
- https://cole-trapnell-lab.github.io/cufflinks/install/ (Binary)
The scripts under this catalogue may lose function as their development didn't stick to the current coding style.
List | Detail |
---|---|
Codename | 04-hs |
Usage | Analyse HISAT2 result |
Type | CoFRIL |
Class | libHISAT.summariser() |
Binary | samtools from SAMtools |
Input | 04-hisat2 |
List | Detail |
---|---|
Codename | 07-cg |
Usage | Get information of isoform under each gene model |
Type | RuDi |
Input | 07-st |