Machine learning detection of unstable antibiotic heteroresistance in E. coli

This repository contains code and certain types data for the project published in %journalname%.

The pipelines were created using Snakemake v7.32.4

Data analysis was performed using R v4.4.1 and machine learning was performed using tidymodels v1.2.0.

How to run the main analysis

First, ensure that the raw (short and long) reads are placed in resources/data_raw/{strain}/short/ and resources/data_raw/{strain}/long/ directories inside the project's directory.

After this is done, you can run the main pipeline (genome assembly, anotation of resistance genes, repeats and insertion sequences):

a) with this command to use conda environments:

# navigate to the project's directory
snakemake --snakefile workflow/snakefile.smk --configfile workflow/config.yaml --use-conda

or

b) with this command to use Apptainer containers instead:

# navigate to the project's directory
snakemake --snakefile workflow/snakefile.smk --configfile workflow/config.yaml --use-singularity

for more details on installation of snakemake and available options (number of threads, running on computer clusters etc), see the official Snakemake documentation

Then you can run the three R notebooks:

generate features table: notebooks/modelling/features.qmd (it is already available in notebooks/modelling/data/features_strain.csv)
(optional) exploratory data anlysis: notebooks/modelling/EDA.qmd
run training and validation:notebooks/modelling/training_and_validation.Rmd
comparison and analysis of the models: notebooks/modelling/models_analysis.Rmd

To install the same versions of R packages as were used in these notebooks, install renv package first and then run renv::restore() (here you can find renv documentation).

How to run the additional analyses

Phylogenetic analysis

snakemake --snakefile workflow/phylogeny.smk --configfile workflow/config_phylogeny.yaml --use-conda

NB: 31 reference strain are required (see the publication for reference numbers).

Analysis of the HR mutants

# analysis of the HR mutants
snakemake --snakefile workflow/mutants.smk --configfile workflow/config_mutants.yaml --use-conda

Raw data availability

The raw sequencing reads used in this project are available from NCBI's SRA under BioProjects PRJNA1165464 and PRJNA1083935.

Models and features table

The pre-compiled features table is available here: notebooks/modelling/data/features_strain.csv

The final models (trained LLR and GBT) are available here: notebooks/modelling/models/llr_final.rds & notebooks/modelling/models/gbt_final.rds

Rule graphs

The main analysis
HR mutants analysis

Name		Name	Last commit message	Last commit date
Latest commit History 1,551 Commits
images		images
notebooks		notebooks
results/phylogeny/tree		results/phylogeny/tree
workflow		workflow
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine learning detection of unstable antibiotic heteroresistance in E. coli

How to run the main analysis

How to run the additional analyses

Phylogenetic analysis

Analysis of the HR mutants

Raw data availability

Models and features table

Rule graphs

About

Releases

Packages

Contributors 2

Languages

andrewgull/HeteroR

Folders and files

Latest commit

History

Repository files navigation

Machine learning detection of unstable antibiotic heteroresistance in E. coli

How to run the main analysis

How to run the additional analyses

Phylogenetic analysis

Analysis of the HR mutants

Raw data availability

Models and features table

Rule graphs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages