Genie.jl is a Julia package designed to simulate the evolution of multiple sequence alignments (MSAs) based on specified parameters and conditions. Its primary function, run_evolution, enables users to simulate evolutionary trajectories for amino acid or nucleotide codon sequences under complex interaction parameters.
The complete description of the algorithm is available at
Emergent time scales of epistasis in protein evolution
Leonardo Di Bari, Matteo Bisardi, Sabrina Cotogno, Martin Weigt, Francesco Zamponi; doi: https://www.pnas.org/doi/10.1073/pnas.2406807121
Please cite this article if you use Genie.jl!
Genie.jl is a Julia package designed to simulate the evolution of multiple sequence alignments (MSAs) based on specified parameters and conditions. Its primary function, run_evolution, enables users to simulate evolutionary trajectories for amino acid or nucleotide codon sequences under complex interaction parameters.
To install the Genie package directly from its Git repository, follow these steps:
Open a terminal and run the following command to clone the Genie repository:
git clone https://github.com/spqb/Genie.jl.gitthen enter in the Genie.jl folder and call Julia with
julia --projectonce inside the Julia REPL install the dependencies
using Pkg
Pkg.activate(".")
Pkg.instantiate()Once you have installed the Genie package, you can either use the example notebook provided in the examples folder or work directly from the Julia REPL with parallel processing.
- Navigate to the
examplesFolder: Locate theexamplesfolder inside the Genie package directory. This folder contains an example Jupyter notebook designed to help you explore Genie’s features.
- Navigate to the Genie package folder: Open Julia in the local environment with n threads over which the MCMC samoling can be parallelized by doing
../julia-1.10.0/bin/julia --project=. --thread n- then you can directly copy each cell of the 'examples' folder in your terminal to explore Genie's features (remember to adjust the paths of the data).
The run_evolution function simulates the evolution of a given multiple sequence alignment (MSA) over a specified number of steps. It uses a combination of Gibbs sampling and Metropolis sampling to evolve the sequences, supporting options for random initialization, codon usage bias, and saving intermediate MSAs at specified intervals.
-
start_msa::Array{T,2}: Initial MSA as a 2D array whereTcan be eitherInt8for amino acids orStringfor nucleotide codons. If amino acids are provided, the corresponding codons will be randomly sampled among those coding for the amino acids. The array dimensions must be(L, M), whereLis the sequence length, and each column represents a different sequence. -
h::Array{T,2}: A 2D array of size(q, L)representing the field parameters. -
J::Array{T,4}: A 4D array of size(q, L, q, L)representing the coupling parameters.
N_steps::Int: Number of steps for the simulation (default is 100).temp: Temperature parameter for the simulation (default is 1.0).p: Probability for choosing Metropolis Sampling (default is 0.5). A value of0will use only Gibbs Sampling with single nucleotide mutations, while1will use only Metropolis Sampling with indels.N_points::Union{Int, Nothing}: Number of points to save the MSA in logarithmic scale along the trajectory (default isnothing). Specify eitherN_pointsoreach_step, but not both.each_step::Union{Int, Nothing}: Interval to save the MSA everyeach_stepsteps along the trajectory (default isnothing). Specify eitherN_pointsoreach_step, but not both.rand_init::Bool: Whether to initialize sequences randomly (default isfalse).q::Int: Number of unique amino acids in the sequences (default is 21).codon_bias::Union{Nothing, Dict{String, Float64}}: Codon usage bias dictionary (default isnothing; assumes no codon bias).verbose::Bool: Whether to print progress information (default isfalse).
The function returns a named tuple containing the results of the simulation. The structure of the output depends on whether N_points or each_step is specified.
-
If
N_pointsoreach_stepare not specified:msa::Array{Int8, 2}: Final MSA in amino acid form.msa_dna::Array{String, 2}: Final MSA in DNA format.codon_usage::Dict{String, Float64}: Codon usage dictionary used in the simulation.p::Float64: Probability of choosing Metropolis Sampling.temp::Float64: Temperature used in the simulation.
-
If either
N_pointsoreach_stepare specified:step_msa::Array{Array{Int8, 2}, 1}: List of MSAs at different time points in amino acid format.msa_dna::Array{Array{String, 2}, 1}: List of MSAs in DNA format at different time points.codon_usage::Dict{String, Float64}: Codon usage dictionary used in the simulation.p::Float64: Probability of choosing Metropolis Sampling.temp::Float64: Temperature used in the simulation.steps::Array{Int, 1}: Steps at which MSAs were saved.
The run_evolution function simulates the evolution of an initial MSA over a specified number of steps. It employs both Gibbs sampling and Metropolis sampling to generate evolved sequences, with options for random initialization, codon usage bias, and saving intermediate MSAs at specified intervals. The function is highly customizable and suitable for simulating complex evolutionary dynamics under various conditions.