Frequence Chaos Game Representation with Deep Learning
Create a virtual environment and install packages
python -m venv env
source env/bin/activate
pip install -r requirements.txt
Set parameters for the experiment in parameters.py
- See (and include) preprocessing functions at
preprocessing.py
Run codes in this order
undersample_sequences.pyextract_sequences.pysplit_data.pywill create a filedatasets.jsonwith train, validation and test setstrain.pytest.pyclassification_metricsclustering_metrics
A folder data/ will be created to save all intermediate results:
<SPECIE>/with all sequences extracted individually in the fasta file, in separated folders by label (Clade)train/will containundersample_by_clade.csvavailable_by_clade.csva summary of the available sequences by clade, subject to the restrictions made inundersample_sequences.py(remove duplicates and empty rows)selected_by_clade.csva summary of the selected sequences by cladecheckpoints/will save the best weights during training.preprocessing.jsonwill save a list with the preprocessing applied to each FCGR during training.training_log.csv: accuracy and loss and learning rate per epoch for train and validation sets.test/will save all the metrics (classification and clustering) resulting from the evaluation of the best model on the test set.plotsaccuracy and loss plots during training, confusion matrixsaliency_maps/representative FCGR by clade, saliency map and relevant k-mers for that representative.
A folder fcgr-<KMER>-mer/ will contain all the FCGR created from the sequences in data/<SPECIE>