Expe_SPSP

Every experiments made for SuperSampler's paper

The performance comparison experiment can be reproduced by using the snakefile in the folder 'Performance comparison'. In the meantime, here are the basic informations needed to reproduce our experiments:

Tools used

Simka
Sourmash
SuperSampler Commit number for latest experiments is: 97efad6

Data

Every genome used for experiments where taken from these sets. Always in the order of appearance in the files.

Command lines

Simka

./simka -in {input file of file} -out {folder for output} -out-tmp {folder for temporary files} -abundance-min 1 -kmer-size {k-mer size}

/!\ Simka requires a special formating for input files of file, see Simka's repository for details /!\

Sourmash

conda activate sourmash_env
sourmash sketch dna -p scaled={subsampling rate},k={k-mer size} --from-file {input file of file} -o {output name for sketch}
Sourmash compare {input sketch} {--containment} --csv {output filename} --ksize {k-mer size}

Sourmash results were sorted to match the input file of file order as Simka and SuperSampler keep this order. SortCSV is present on SuperSampler's repository

./sortCSV {input comparison matrix} {output name} {input file of file (to get original order)}

SuperSampler

./sub_sampler -f {input file of file} -s {subsampling rate} -p {prefix for output sketches}_ -k {k-mer size} -m {minimizer size}
./comparator -f {input file of file} -o {prefix for output}

Values tested:

Scalability experiment

As only computational time and ram were monitored, we did not launch Simka on these experiments.

K-mer size = 63
Subsampling rate = 1000
Minimizer size = 15
From 100 to 128,000 RefSeq genomes.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Performance comparison		Performance comparison
README.md		README.md
fof_refseq.txt		fof_refseq.txt
fof_salmonellas.txt		fof_salmonellas.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expe_SPSP

Tools used

Data

Command lines

Simka

Sourmash

SuperSampler

Values tested:

Scalability experiment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Expe_SPSP

Tools used

Data

Command lines

Simka

Sourmash

SuperSampler

Values tested:

Scalability experiment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages