Stratification of multi-label datasets

Basic Description

This repo contains an implementation of straSplit comprising 8 algorithms to split multi-label data into two sets: training and test.

Dependencies

The codebase is tested to work under Python 3.8. To install the necessary requirements, run the following commands:

pip install -r requirements.txt

Basically, straSplit requires following packages:

Anaconda
NumPy (>= 1.18)
scikit-learn (>= 0.24)
pandas (>= 1.2)
NetworkX (== 2.5)
scipy (>=1.6)
altair (==4.1)

Installation and Basic Usage

Run the following commands to clone the repository to an appropriate location:

git clone https://github.com/arbasher/straSplit

Please follow this notebook for tutorials.

Citing

If you find straSplit useful in your research, please consider citing the following papers:

M. A. Basher, Abdur Rahman, McLaughlin, Ryan J., and Hallam, Steven J.. "Metabolic pathway inference using multi-label classification with rich pathway features." , PLoS Comput Biol (2020).
M. A. Basher, Abdur Rahman and Hallam, Steven J.. "Multi-label pathway prediction based on active dataset subsampling." , bioRxiv (2020).
M. A. Basher, Abdur Rahman. "Machine learning methods for metabolic pathway inference from genomic sequence information." , Doctoral dissertation, University of British Columbia, (2020).
Moyano, J.M., Gibaja, E.L. and Ventura, S.. "MLDA: A tool for analyzing multi-label datasets." , Knowledge-Based Systems (2017).
Merrillees, M. and Du, L.. "Stratified sampling for extreme multi-label data." , arXiv preprint (2021).
Sechidis, K., Tsoumakas, G. and Vlahavas, I.. "On the stratification of multi-label data." , In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 145-158). Springer, Berlin, Heidelberg (2011).

Contact

For any inquiries, please contact: arbasher@student.ubc.ca

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
src		src
LICENSE		LICENSE
README.md		README.md
flowchart.png		flowchart.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stratification of multi-label datasets

Basic Description

Dependencies

Installation and Basic Usage

Citing

Contact

About

Releases

Packages

Contributors 2

Languages

License

arbasher/straSplit

Folders and files

Latest commit

History

Repository files navigation

Stratification of multi-label datasets

Basic Description

Dependencies

Installation and Basic Usage

Citing

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages