Skip to content

Table of Simulation Parameters

Moritz Smolka edited this page Jul 21, 2017 · 3 revisions

This page contains a list of parameters for creating data sets using the internal Teaser simulation pipeline.

General

Parameter Default Description
reference (None) Path relative to Teaser/references directory for mapping reference FASTA file
platform illumina Sequencing platform. Can be illumina, 454 or ion_torrent
read_length 100 Average Read length to simulate
read_count (Calculated) Calculated based on subsampling parameters. Must be set manually if subsampling is disabled.
coverage 1 Multiplier for read count
paired No Library is paired-end?
insert_size 500 Inner distance in base pairs between the outer ends of both reads in a pair
insert_size_error 50 Standard deviation for insert size
mutation_rate 0.001 Overall rate of mutations (0-1)
mutation_indel_frac 0.3 Fraction of overall mutations that are indels - the rest are SNPs.
mutation_indel_avg_len 1 Average length for indels in base pairs
error_rate_mult 1 Multiplier for overall sequencing error rate

Sampling

Prior to simulation, Teaser by default samples a number of small regions from the reference genome. For most use cases it is sufficient to rely on the default values for this process.

Parameter Default Description
enable Yes Should the reference genome be subsampled for simulation
ratio (Calculated) Fraction of the original reference to sample
region_len_multiplier 10 Region length is calculated by multiplying this value with the read length / insert size
region_pad (Calculated) Padding to add between subsampled regions (defaults to 2 x read length / insert size)
region_len (Calculated) Set to manually override the subsampling region length
Clone this wiki locally