Skip to content

Commit

Permalink
Adds corrections to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
samhorsfield96 committed Aug 20, 2020
1 parent 7a4020b commit 93388b2
Showing 1 changed file with 14 additions and 10 deletions.
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,20 @@ This repository contains code used in the Master's Thesis titled:

- python3
- biopython
- numpy
- R version 3+

## Usage

### analyse_gaf_genic.py

Analyse the proportion of genic read bases aligned by GraphAligner.
Analyse the proportion of genic read bases aligned by [Graphaligner](https://github.com/maickrau/GraphAligner).

```python analyse_gaf_genic.py gaffile blastfile reads.fa outfile.txt```

Input/Output:
- ```gaffile```: graphical alignment file produced by [Graphaligner](https://github.com/maickrau/GraphAligner)
- ```blastfile```: [BLAST](https://www.sciencedirect.com/science/article/abs/pii/S0022283605803602?via%3Dihub) output file in tabular format generated from exact alignment of gene sequences from [Lees et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5930550/) (aligned files are '*_dna.fa') to simulated genomes (```blastn -outfmt 6 -perc_identity 100 -qcov_hsp_perc 100```)
- ```gaffile```: graphical alignment file (GAF) produced by [Graphaligner](https://github.com/maickrau/GraphAligner)
- ```blastfile```: [BLAST](https://www.sciencedirect.com/science/article/abs/pii/S0022283605803602?via%3Dihub) output file in tabular format generated from exact alignment of gene sequences from [Lees et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5930550/) (aligned files are '*_dna.fa') to respective simulated genomes (```blastn -outfmt 6 -perc_identity 100 -qcov_hsp_perc 100```)
- ```reads.fa```: read sequences used in alignment in FASTA format, generated by [Nanosim-H](https://github.com/karel-brinda/NanoSim-H)
- ```outfile.txt```: output summary file

Expand All @@ -40,7 +42,7 @@ Analyse node length and degree from a GFA file.
```python analyse_nodes.py gfafile outpref```

Input/Output
- ```gfafile```: GFA file produced produced from graph construction
- ```gfafile```: GFA file produced from graph construction
- ```outpref```: output prefix for distribution and summary files

### analyse_unitig.py
Expand All @@ -58,10 +60,12 @@ Input/Output

R script for analysis of output files from analyse_untig.py, generating unitig frequency plot.

To use, specify input directory at ```#input directory```.
To use, specify input directory at ```#input directory```, with files in .txt format.

### check_ORF.py

This script contains two functions for analysing ORF calls made by a gene caller.

#### check_ORF_in_ref()

Checks presence of a called ORF in forward and reverse complements of a set of reference source sequences.
Expand All @@ -73,7 +77,7 @@ Input/Output
- ```ref_fasta_for```: Multi-FASTA of reference source sequences (forward strand)
- ```ref_fasta_rev```: Multi-FASTA of reference source sequences (reverse strand)
- ```query_fasta```: ORF calls in FASTA format
- ```outfasta```: output FASTA containing ORFs not present in forward or reverse sequences.
- ```outfasta```: output FASTA containing ORFs not present in forward or reverse sequences

#### check_ref_in_query()

Expand All @@ -89,14 +93,14 @@ Input/Output

### compare_gene_calls.py

Compares known genes against called ORFs by Prodigal/ggCaller in S. pneumoniae capsular biosynthetic loci (CBL). Prints recall and precision, and returns list of unmatched sequences.
Compares known genes against called ORFs by [Prodigal](https://github.com/hyattpd/Prodigal) or [ggCaller](https://github.com/samhorsfield96/ggCaller) in S. pneumoniae capsular biosynthetic loci (CBL). Prints recall and precision, and returns list of unmatched sequences.

```python compare_gene_calls.py reference_genes gene_calls caller_type group```

Input/Output
- ```reference_genes```: known genes in FASTA format
- ```gene_calls```: ORF calls by [Prodigal](https://github.com/hyattpd/Prodigal) or [ggCaller](https://github.com/samhorsfield96/ggCaller) in FASTA format
- ```caller_type```: specify which caller used (ggCaller = ggc, Prodigal = prod)
- ```caller_type```: specify which gene caller used (ggCaller = ggc, Prodigal = prod)
- ```group```: CBL group used in comparison.

### gfa_to_fasta.py
Expand All @@ -112,6 +116,6 @@ Input/Output

### panaroo_gene_freq.R

R script for analysis of RTAB file generated by [Panaroo](https://github.com/gtonkinhill/panaroo)
R script for analysis of RTAB file generated by [Panaroo](https://github.com/gtonkinhill/panaroo).

For usage, specify input directory at ```#input directory```.
For usage, specify input directory at ```#input directory```, with files in .RTAB format.

0 comments on commit 93388b2

Please sign in to comment.