4. Inputs

Below is a summary and description of each input parameter.

All Input Options

// Input options
input                      = "${projectDir}/assets/samplesheet.csv"
ncbi                       = null
db                         = null
push                       = false

// QC: Reads & Assembly
read_qc                   = true
assembly_qc               = true
min_contig_len            = 300

// QC: Variant Calling
max_depth                  = 100
min_genfrac                = 85
max_lowcov                 = 5
max_het                    = 1

// Tree options
max_ml                     = 500
min_tree                   = 2

// Reporting options
strong_link_cutoff         = 10
inter_link_cutoff          = 50
partition_threshold        = 100
max_static                 = 100

// Other options (caution)
db_info                    = true
resolve_merged             = true
run_id                     = null

--input

Path to comma-separated file containing information about PopPUNK databases or samples. An absolute path to the file should be used.

PopPUNK database sheet example

pp_db_list.csv:

taxa,pp_db
Acinetobacter_baumannii,abaumannii_db.tar.gz
Escherichia_coli,ecoli_db.tar.bz2
Staphylococcus_aureus,staph_db/

Samplesheet example

samplesheet.csv:

sample,taxa,assembly,fastq_1,fastq_2
sample1,Acinetobacter_baumannii,sample1.fasta,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,Escherichia_coli,sample2.fasta,sample2_R1.fastq.gz,sample2_R2.fastq.gz
sample3,Staphylococcus_aureus,sample3.fasta,sample3_R1.fastq.gz,sample3_R2.fastq.gz

--ncbi

Path to comma-separated file containing information for samples that should be pulled from NCBI.

sample,taxa,assembly,sra
SAMN12769618,Acinetobacter_baumannii,GCF_008632635.1,SRR11176973

--db

Path to the BigBacter database. It is recommended that PopPUNK databases be configured to a single common directory (i.e., the BigBacter database). This database can be set up automatically using the PREPARE_DB workflow (-entry PREPARE_DB).

--push

Tells the pipeline whether you want to save 'i.e., push' new samples to the BigBacter database (default: false). It is recommended that you check results prior to pushing files. Once confirmed, files can be pushed using --push true and -resume.

--max_depth

Maximum read depth per sample (default: 100). This is used by Snippy to randomly subsample reads.

--min_genfrac

Minimum percent of the reference genome that a sample must contain for it to be included in the core SNP analysis (default: 85).

--max_lowcov

Maximum percent of low coverage sites allowed for a sample to be included in the core SNP analysis (default: 5).

--max_het

Maximum percent of heterogenous sites allowed for a sample to be included in the core SNP analysis (default: 1). Given that bacteria are haploid, the presence of heterogenous sites indicates either contamination or unrepresented homologs.

--strong_link_cutoff

SNP distance threshold used to classify strong genomic linkages (default: 10). This has no impact on core SNP analysis and is only meant to aid interpretation of genetic linkages.

--inter_link_cutoff

SNP distance threshold used to classify intermediate genomic linkages (default: 50). This has no impact on core SNP analysis and is only meant to aid interpretation of genetic linkages.

--partition_threshold

Number of estimated nucleotide substitutions used to partition samples in maximum likelihood trees using cutree and hclust (default: 100).

--max_static

Maximum number of samples included in a static image (i.e., .jpg). Static images will not be generated for clusters containing more samples than this value.

--max_ml

Maximum number of samples that a cluster can contain for maximum likelihood tree generation (default: 500). If the number of samples exceeds this threshold, BigBacter will switch to the neighbor-joining approach.

--min_tree

Minimum number of samples in a cluster for a tree to be produced, excluding the reference genome (default: 2). This must be greater than 2 or IQTREE will fail.

--db_info

Provides a summary of your BigBacter database (default: true). A short summary will be printed to the screen upon completion of the pipeline and a full summary can be found in the run directory (e.g., ${timestamp}-db-info.csv).

--resolve_merged

If merged PopPUNK clusters should be resolved (default: true).

Caution

Only turn this off if you know what you are doing!

--run_id

Will be used in place of the timestamp ID.

Caution

Misuse could lead to database corruption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

4. Inputs

All Input Options

--input

PopPUNK database sheet example

Samplesheet example

--ncbi

--db

--push

--max_depth

--min_genfrac

--max_lowcov

--max_het

--strong_link_cutoff

--inter_link_cutoff

--partition_threshold

--max_static

--max_ml

--min_tree

--db_info

--resolve_merged

--run_id

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally