Estimating Phylogenetic trees using 30 microorganisms (previously 6 organisms: review data folder and notebook. Looking at the 16S rRNA gene with Unsupervised Learning, web based tools and Molecular Evolutionary Genetics Analysis MEGA7. Further we are looking at motifs and finding out what they do.
It is important to know these regions since they can potentially give use clues about the regions we can target for targeted DNA therapies.
Make the virtual environment. When working in your own system
python3 -m venv phylo-env
Activate the virtual environment.
source phylo-env/bin/activate
Install packages. You need an email to be in your .bashrc file to run biopython.
make install run_script
In your terminal, in the directory where you cloned this repository. Run this command to run notebooks.
jupyter notebook Phylogenetic_trees_unsupervised_learning.ipynb
Previously, we've not provided a codebook/data description file since one of the headings cover that in the notebook. Otherwise, you can check out the notebook or the HTML file i've provided in the repository.
Build DockerFile
sudo docker build -t phylo-exp .
Run the Docker image
sudo docker run -it -p 8888:8888 --rm phylo-exp:latest
Initialize dvc to the folder. To allow us to use dvc functionality to be used in the repo. NB. files will be created in the directory.
dvc init
Track changes to the different data files. The reason why we are doing this is because these files will change during the experiment especially if the investigator wants to try other experiments with more data.
dvc add updated_data/data.md
dvc add updated_data/sequences.fasta
dvc add updated_data/sequence_metrics.csv
Commit changes to save the changes that have occured in the repository.
dvc commit