- Abstract
- Motivation
- Description of Framework
- Installation
- Data Download
- Execution
- Documentation
- Contribute
- Citation
-
Synthetic Data Generation: Using the NEAT v3.3 simulator, we generate synthetic genomics data that mimics real genome sequences, serving as a ground truth.
-
Benchmarking Variant Callers: We evaluate five somatic variant callers — GATK-Mutect2, Freebayes, VarDict, VarScan2, and LoFreq — using these synthetic datasets.
All data are openly available on Zenodo. For specific instructions, refer to our User Guide.
-
Create the Conda environment:
conda env create -f environment.yml conda activate synth4bench
-
Install NEAT v3.3:
Download version v3.3.
To call the main script:python gen_reads.py --help
For further details, see the NEAT README included in the download.
-
Install bam-readcount:
Follow their installation instructions.
After building, verify installation:build/bin/bam-readcount --help
If you encounter issues during the
make
process, you can alternatively use the executable available here and place it in thebam-readcount/build/bin
folder. -
Download VarScan Extra Script:
The extra script
vscan_pileup2cns2vcf.py
for VarScan is available here.
Simply configure your parameters in the parameters.yaml
file, then execute:
bash s4b_run.sh
This single command generates synthetic data, runs variant calling for all selected tools, and performs downstream analysis and plotting.
For full execution instructions, see our User Guide.
For further documentation, visit the documentation page.
We welcome and greatly appreciate any feedback or contributions!
If you have questions, please open an issue here or email sfragkoul@certh.gr
.
Our work has been submitted to the bioRxiv preprint repository. If you use synth4bench, or any of our scripts/code, please cite:
S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. E. Psomopoulos, “Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.” 2024, doi:10.1101/2024.03.07.582313.
-
S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. Psomopoulos, synth4bench: Benchmarking Somatic Variant Callers – A Tale Unfolding In The Synthetic Genomics Feature Space, 23rd European Conference On Computational Biology (ECCB24), Sep 2024, Turku, Finland, doi: 10.5281/zenodo.14186509
-
S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. Psomopoulos, “Exploring Somatic Variant Callers' Behavior: A Synthetic Genomics Feature Space Approach”, ELIXIR AHM24, Jun 2024, Uppsala, Sweden, doi: 10.7490/f1000research.1119793.1
-
S.-C. Fragkouli, N. Pechlivanis, A. Orfanou, A. Anastasiadou, A. Agathangelidis and F. Psomopoulos, Synth4bench: a framework for generating synthetic genomics data for the evaluation of somatic variant calling algorithms, 17th Conference of Hellenic Society for Computational Biology and Bioinformatics (HSCBB), Oct 2023, Thessaloniki, Greece, doi:10.5281/zenodo.8432060
-
S.-C. Fragkouli, N. Pechlivanis, A. Agathangelidis and F. Psomopoulos, Synthetic Genomics Data Generation and Evaluation for the Use Case of Benchmarking Somatic Variant Calling Algorithms, 31st Conference in Intelligent Systems For Molecular Biology and the 22nd European Conference On Computational Biology (ISΜB-ECCB23), Jul 2023, Lyon, France, doi:10.7490/f1000research.1119575.1