This repository contains code and certain types data for the project published in %journalname%.
The pipelines were created using Snakemake v7.32.4
Data analysis was performed using R v4.4.1 and machine learning was performed using tidymodels v1.2.0.
First, ensure that the raw (short and long) reads are placed in resources/data_raw/{strain}/short/
and resources/data_raw/{strain}/long/
directories inside the project's directory.
After this is done, you can run the main pipeline (genome assembly, anotation of resistance genes, repeats and insertion sequences):
a) with this command to use conda environments:
# navigate to the project's directory
snakemake --snakefile workflow/snakefile.smk --configfile workflow/config.yaml --use-conda
or
b) with this command to use Apptainer containers instead:
# navigate to the project's directory
snakemake --snakefile workflow/snakefile.smk --configfile workflow/config.yaml --use-singularity
for more details on installation of snakemake and available options (number of threads, running on computer clusters etc), see the official Snakemake documentation
Then you can run the three R notebooks:
- generate features table:
notebooks/modelling/features.qmd
(it is already available innotebooks/modelling/data/features_strain.csv
) - (optional) exploratory data anlysis:
notebooks/modelling/EDA.qmd
- run training and validation:
notebooks/modelling/training_and_validation.Rmd
- comparison and analysis of the models:
notebooks/modelling/models_analysis.Rmd
To install the same versions of R packages as were used in these notebooks, install renv package first and then run renv::restore()
(here you can find renv documentation).
snakemake --snakefile workflow/phylogeny.smk --configfile workflow/config_phylogeny.yaml --use-conda
NB: 31 reference strain are required (see the publication for reference numbers).
# analysis of the HR mutants
snakemake --snakefile workflow/mutants.smk --configfile workflow/config_mutants.yaml --use-conda
The raw sequencing reads used in this project are available from NCBI's SRA under BioProjects PRJNA1165464 and PRJNA1083935.
The pre-compiled features table is available here: notebooks/modelling/data/features_strain.csv
The final models (trained LLR and GBT) are available here: notebooks/modelling/models/llr_final.rds
& notebooks/modelling/models/gbt_final.rds