Revisiting the Connection Between Gut Microbiota and Depressive Disorder: An Evaluation and Reproduction of Machine Learning Approaches

Michael Wurth, CS:4980 at the University of Iowa

Abstract

Research into the human gut microbiome has increasingly revealed associations with various psychiatric disorders, suggesting a link between microbial composition and mental health outcomes. Angelova et al.² investigated these associations by training machine learning classifiers on metagenomic sequence read data, finding that the YOLOv8 (You Only Look Once) convolutional neural network (CNN) was most effective in classifying depressive states based on gut microbiome composition. Our study aims to revisit and extend the findings of Angelova et al.² by evaluating the performance of logistic regression, random forest, and support vector machines (SVMs). Additionally, we assess whether these models can achieve performance comparable to YOLOv8 under similar data conditions and parameter optimization. Using the datasets from the reference study, we re-examined model efficacy before and after parameter tuning and cross-validation. Logistic regression notably outperformed previous reports, achieving a cross-validated accuracy of 0.77 and a ROC AUC of 0.81. On their synthetically expanded dataset, random forest achieved near-perfect classification. SVM models attained perfect accuracy on this dataset, matching YOLOv8's performance as reported by Angelova et al.² These findings suggest that simpler models can achieve competitive results on metagenomic data when conditions are controlled and classifier optimization is appropriately applied.

Key Words: [Depression, gut microbiome, machine learning, logistic regression, psychiatric disorder]

Testing This Project

Dependencies

This project was developed in Python 3.11.0.
From the project root folder, run the following through your command line to install all dependencies:

pip install -r requirements.txt

Run Experiments

From the project root, you can execute this in the terminal to start running experiments:

Python scripts/ml_interface.py

The above will prompt the user to run study design experiments 1-4 by default.
To selectively run specific experiments from the study design, you can execute the following in the terminal, assuming the current working directory is the project root:

cd scripts
Python

from ml_interface import MLInterface
interface=MLInterface()

interface.perform_experiment_1() # perform functions for part 1 of study design
interface.perform_experiment_2() # perform functions for part 2
interface.perform_experiment_3() # you can probably guess what this one does
interface.perform_experiment_4()

interface.plot_SVM_decision_boundary() # Make SVM decision boundary figures for RBF & poly kernels

Build Your Own Experiment

I have not directly implemented functions to perform the experiments outlined in section 5: Other Experiments.
However, you can still perform these tests (and many others) through MLInterface.
Here's an example of how to train and test a logistic regression model on a dataset with demographic data (sex, age) included:

Python

from ml_interface import MLInterface
interface=MLInterface()

interface.select_model('lg')
interface.load_experiment_set(2, 'demo')
interface.train_model()
results = interface.evaluate_model()

interface.start_new_result_log('My experiment')
interface.write_to_result_log(results, 'Logistic Regression on Demographic Dataset')
interface.dump_result_log()

Project Aims

Determine if logistic regression or random forest models can be more performant than was reported in study #2
Investigate whether study #2's YOLOv8 results are reliable
Find out if other classifiers (like support vector machines) can predict competitively on metagenomic signature data
Compare how these classifiers define their respective decision boundaries and feature importances

Study Design

Comparative Analysis: Classical Models
[ Log Reg Random Forest ]
- Original 70:30 split with original parameters
- 10-Fold cross-validation with original parameters
- 10-Fold CV with grid search-optimized parameters
- LogReg Only 10-Fold CV + bag grid search-optimized model
Support Vector Machine (SVM) Performance on Metagenomic Signature Data
[ SVM(RBF) SVM(polynomial) LinearSVM ]
- Original 70:30 split
- 10-fold cross validation
- 10-fold CV + grid search-optimized parameters
- bag models with optimized parameters
YOLO Comparison: Using Synthetic Data
[ SVM Log Reg Random Forest ]
- Original dataset split
- Grid search optimal parameters
- Statistical comparison
Feature Importance Comparison
[ SVM Log Reg Random Forest ]
- Compare respective assessments of feature importance between models
Other Experiments
[ SVM Log Reg Random Forest ]
- Comparing with datasets that include demographic data
- Comparing with datasets sourced from study 1

Foundational Works

These two studies form the foundation for my project:

1. Kovtun AS, Averina OV, Angelova IY, et al. Alterations of the Composition and Neurometabolic Profile of Human Gut Microbiota in Major Depressive Disorder. Biomedicines. 2022;10(9 :2162. Published 2022 Sep 2. https://doi.org/10.3390/biomedicines10092162

Collected gut microbiome WGS samples from 74 individuals: 36 afflicated with MDD, 38 healthy controls
Formulated list of key enzymes known to be involved in production/degradation of metabolites associated with depression pathology (eg. Serotonin: Aromatic amino acid decarboxylase)
For 46 common human gut microbiome genera, WGS samples were probed for orthologous amino acid sequences of key enzymes
For each sample, ortholog matches were processed into metagenomic signatures: species-enzyme pairs and their corresponding relative abundance

2. Angelova IY, Kovtun AS, Averina OV, Koshenko TA, Danilenko VN. Unveiling the Connection between Microbiota and Depressive Disorder through Machine Learning. International Journal of Molecular Sciences. 2023; 24(22):16459. https://doi.org/10.3390/ijms242216459

Attempted to predict depression using models trained on the metagenomic signatures collected above
Tested logistic regression, random forest, and YOLOv8 CNN

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Revisiting the Connection between Microbiota and Depressive Disorder- A Comparative Evaluation and Reproduction of Machine Learning Approaches.pdf		Revisiting the Connection between Microbiota and Depressive Disorder- A Comparative Evaluation and Reproduction of Machine Learning Approaches.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting the Connection Between Gut Microbiota and Depressive Disorder: An Evaluation and Reproduction of Machine Learning Approaches

Abstract

Testing This Project

Dependencies

Run Experiments

Build Your Own Experiment

Project Aims

Study Design

Foundational Works

1. Kovtun AS, Averina OV, Angelova IY, et al. Alterations of the Composition and Neurometabolic Profile of Human Gut Microbiota in Major Depressive Disorder. Biomedicines. 2022;10(9 :2162. Published 2022 Sep 2. https://doi.org/10.3390/biomedicines10092162

2. Angelova IY, Kovtun AS, Averina OV, Koshenko TA, Danilenko VN. Unveiling the Connection between Microbiota and Depressive Disorder through Machine Learning. International Journal of Molecular Sciences. 2023; 24(22):16459. https://doi.org/10.3390/ijms242216459

About

Releases

Packages

Languages

mike-coding/CS4980-GutMetagenome-ML-Project

Folders and files

Latest commit

History

Repository files navigation

Revisiting the Connection Between Gut Microbiota and Depressive Disorder: An Evaluation and Reproduction of Machine Learning Approaches

Abstract

Testing This Project

Dependencies

Run Experiments

Build Your Own Experiment

Project Aims

Study Design

Foundational Works

1. Kovtun AS, Averina OV, Angelova IY, et al. Alterations of the Composition and Neurometabolic Profile of Human Gut Microbiota in Major Depressive Disorder. Biomedicines. 2022;10(9 :2162. Published 2022 Sep 2. https://doi.org/10.3390/biomedicines10092162

2. Angelova IY, Kovtun AS, Averina OV, Koshenko TA, Danilenko VN. Unveiling the Connection between Microbiota and Depressive Disorder through Machine Learning. International Journal of Molecular Sciences. 2023; 24(22):16459. https://doi.org/10.3390/ijms242216459

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages