-
Notifications
You must be signed in to change notification settings - Fork 10
Table of Simulation Parameters
Moritz Smolka edited this page Jul 21, 2017
·
3 revisions
This page contains a list of parameters for creating data sets using the internal Teaser simulation pipeline.
Parameter | Default | Description |
---|---|---|
reference | (None) | Path relative to Teaser/references directory for mapping reference FASTA file |
platform | illumina | Sequencing platform. Can be illumina, 454 or ion_torrent |
read_length | 100 | Average Read length to simulate |
read_count | (Calculated) | Calculated based on subsampling parameters. Must be set manually if subsampling is disabled. |
coverage | 1 | Multiplier for read count |
paired | No | Library is paired-end? |
insert_size | 500 | Inner distance in base pairs between the outer ends of both reads in a pair |
insert_size_error | 50 | Standard deviation for insert size |
mutation_rate | 0.001 | Overall rate of mutations (0-1) |
mutation_indel_frac | 0.3 | Fraction of overall mutations that are indels - the rest are SNPs. |
mutation_indel_avg_len | 1 | Average length for indels in base pairs |
error_rate_mult | 1 | Multiplier for overall sequencing error rate |
Prior to simulation, Teaser by default samples a number of small regions from the reference genome. For most use cases it is sufficient to rely on the default values for this process.
Parameter | Default | Description |
---|---|---|
enable | Yes | Should the reference genome be subsampled for simulation |
ratio | (Calculated) | Fraction of the original reference to sample |
region_len_multiplier | 10 | Region length is calculated by multiplying this value with the read length / insert size |
region_pad | (Calculated) | Padding to add between subsampled regions (defaults to 2 x read length / insert size) |
region_len | (Calculated) | Set to manually override the subsampling region length |