This repo contains an implementation of straSplit comprising 8 algorithms to split multi-label data into two sets: training and test.
The codebase is tested to work under Python 3.8. To install the necessary requirements, run the following commands:
pip install -r requirements.txt
Basically, straSplit requires following packages:
- Anaconda
- NumPy (>= 1.18)
- scikit-learn (>= 0.24)
- pandas (>= 1.2)
- NetworkX (== 2.5)
- scipy (>=1.6)
- altair (==4.1)
Run the following commands to clone the repository to an appropriate location:
git clone https://github.com/arbasher/straSplit
Please follow this notebook for tutorials.
If you find straSplit useful in your research, please consider citing the following papers:
- M. A. Basher, Abdur Rahman, McLaughlin, Ryan J., and Hallam, Steven J.. "Metabolic pathway inference using multi-label classification with rich pathway features." , PLoS Comput Biol (2020).
- M. A. Basher, Abdur Rahman and Hallam, Steven J.. "Multi-label pathway prediction based on active dataset subsampling." , bioRxiv (2020).
- M. A. Basher, Abdur Rahman. "Machine learning methods for metabolic pathway inference from genomic sequence information." , Doctoral dissertation, University of British Columbia, (2020).
- Moyano, J.M., Gibaja, E.L. and Ventura, S.. "MLDA: A tool for analyzing multi-label datasets." , Knowledge-Based Systems (2017).
- Merrillees, M. and Du, L.. "Stratified sampling for extreme multi-label data." , arXiv preprint (2021).
- Sechidis, K., Tsoumakas, G. and Vlahavas, I.. "On the stratification of multi-label data." , In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 145-158). Springer, Berlin, Heidelberg (2011).
For any inquiries, please contact: arbasher@student.ubc.ca