SeqAn3 Tools (st_*)

These are simple tools to manipulate sequence data.

Available tools

st_fastq2fasta
- converts fastq file to fasta file
st_dna5todna4
- converts fasta with dna5 alphabet to dna4 alphabet
st_index_build
- build a 2FM-Index/Bidirectional FMIndex
st_index_search
- searches for given queries using an 2FM-Index
st_fasta_cut
- reduces a fasta file to either a certain number of base pairs or a certain number of chromosomes
st_fasta_info
- prints information about a fasta file
st_fasta_dump
- dumps a fasta file into a text file with delimiter in between and at the end
st_binary_rev
- reverses a binary file byte-wise without loading the complete file into memory
st_scheme_build
- constructs a search scheme
st_scheme_stat
- prints information about a search scheme
st_bwt_build
- constructs a bwt
st_multistring_filter
- maps indices that can't handle multistring indices, from a single concatenated string back to multi strings
st_name_id_mapper
- maps sequence names to ids, or vice versa
st_sam_filter
- filters sam files by number of errors
st_sam_info
- some infos about the sam file that no one else reports
st_local_mapper
- convert output from programs like 'columba' to sam/bam format

Usage

$ st_fastq2fasta input.fastq > output.fasta
Converts a fastq file into a fasta file

$ st_dna5todna4 input.fasta > output.fasta
Converts a dna5 alphabet to dna4 and replaces every occurrence of 'N' with a random 'A', 'C', 'G' or 'T'.

$ st_index_build -v input.dna5.fasta output.dna5.index
$ st_index_build --dna4 -v input.dna4.fasta output.dna4.index
Creates an 2fm-index from a fasta file

$ st_index_search input.dna5.index queries.dna5.fasta results.txt -k 2
$ st_index_search --dna4 input.dna4.index queries.dna4.fasta results.txt -k 2
Searches the index for reads provided by the queries file with 2 errors. Results are stored in results.txt

$ st_fasta_cut --max_chr 2 input.fasta > output.fasta
$ st_fasta_cut --max_bases 1000000 input.fasta > output.fasta
Reduces the number of bases or chromosons in a fasta file

$ st_fasta_info input.fasta lengths
$ st_fasta_info input.fasta names
Prints lengths or names of all fasta file entries

$ st_fasta_dump input.fasta -d '#' -e '$' > output.txt
Converts fasta file into a text file, where each sequence is separated by '#' and a '$' is attached at the end

$ st_binary_rev input.txt > output.txt
reverses the bytes of input.txt

$ st_scheme_build -g "01*0_opt" -k2 > 01star0_k2.ss
constructs a search scheme with two allowed errors, following the optimized 01*0 principle

$ st_scheme_stat --input 01star0_k2.ss --length 100
prints information about a search scheme: number of parts, searches, min/max errors, feasibility, uniqueness, node count, node count for edit distance.

$ st_bwt_build --input sometext.txt > bwt.txt
constructs the bwt of the given text.

Build instructions

clone this repository: git clone https://github.com/SGSSGene/seqan3_tools
create and enter build folder: mkdir -p seqan3_tools/build; cd $_
run cmake with release options: cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=native" ..
compile with make
(optional) put them in your PATH variable PATH="${PATH}:path/to/seqan3_tools/build/bin"

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
LICENSES		LICENSES
cmake		cmake
src		src
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeqAn3 Tools (st_*)

Available tools

Usage

Build instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

SGSSGene/seqan3_tools

Folders and files

Latest commit

History

Repository files navigation

SeqAn3 Tools (st_*)

Available tools

Usage

Build instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages