Skip to content

TimRouze/Expe_SPSP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Expe_SPSP

Every experiments made for SuperSampler's paper

The performance comparison experiment can be reproduced by using the snakefile in the folder 'Performance comparison'. In the meantime, here are the basic informations needed to reproduce our experiments:

Tools used

Data

Every genome used for experiments where taken from these sets. Always in the order of appearance in the files.

Command lines

Simka

./simka -in {input file of file} -out {folder for output} -out-tmp {folder for temporary files} -abundance-min 1 -kmer-size {k-mer size}

/!\ Simka requires a special formating for input files of file, see Simka's repository for details /!\

Sourmash

conda activate sourmash_env
sourmash sketch dna -p scaled={subsampling rate},k={k-mer size} --from-file {input file of file} -o {output name for sketch}
Sourmash compare {input sketch} {--containment} --csv {output filename} --ksize {k-mer size}

Sourmash results were sorted to match the input file of file order as Simka and SuperSampler keep this order. SortCSV is present on SuperSampler's repository

./sortCSV {input comparison matrix} {output name} {input file of file (to get original order)}

SuperSampler

./sub_sampler -f {input file of file} -s {subsampling rate} -p {prefix for output sketches}_ -k {k-mer size} -m {minimizer size}
./comparator -f {input file of file} -o {prefix for output}

Values tested:

Scalability experiment

As only computational time and ram were monitored, we did not launch Simka on these experiments.

  • K-mer size = 63
  • Subsampling rate = 1000
  • Minimizer size = 15
  • From 100 to 128,000 RefSeq genomes.

About

Every experiments made for SuperSampler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors