yaml_ml
streamlines machine learning workflows by letting you
define data preprocessing, model training,
and evaluation in one YAML file. Automate your ML pipeline with minimal code.
Important
Disclaimer: this is the very first version of the package. It is still under development. Use it at your own risk.
Create a virtual environment (e.g. with conda
), activate it and upgrade pip
:
conda create --name yaml_ml_env python=3.11
conda activate yaml_ml_env
pip install --upgrade pip
Then install the package:
pip install yaml-ml
First, create a YAML configuration file: see docs.
Then, after having activated the environment where yaml_ml
is installed, run the command:
python -m yaml_ml --cfg path/to/your/config/yaml/file
In case you want to test different configurations, create corresponding YAML files
and put them in a unique folder.
To launch all the corresponding pipelines in parallel using multiprocessing with N
worker processes, run the command:
python -m yaml_ml --cfg path/to/your/configs/folder --n_processes N
Note
Without providing the --n_processes
argument, pipelines will be launched sequentially.
Some guidelines about how to define a configuration file are given in the Configuration File Documentation.
All available options are consolidated in the Modules File.
You can also find examples of yaml_ml
configuration files in the Examples Folder
and a template file template_cfg.yaml.
Check out explanations of a complete usage example here.
yaml_ml
is mainly based on Scikit-learn tools: https://scikit-learn.org/stable/.
By default, installing yaml_ml
will also install:
lightgbm
(see https://lightgbm.readthedocs.io/en/stable/) to allow for training light gradient boosting modelscatboost
(see https://catboost.ai/) to allow for training CatBoost models
If you do not want to use them, you can install yaml_ml
from sources after
commenting requirements.txt
lines corresponding to these libraries. To do so, first clone the repo:
git clone https://github.com/GFaure9/yaml-ML.git
Then comment unwanted packages in the requirements file and run in your virtual environment:
cd ./yaml-ML
pip install -e .
If you cloned the repo and installed the package from sources (pip install -e .
),
you can make sure everything works fine before using it by running:
cd ./tests
python test_yaml_ml.py
At the end, you should get something like:
Ran 4 tests in 120.840s
OK
yaml_ml
was designed with a modular architecture, with the aim of facilitating the
integration of new models and data preprocessing techniques as needed.
So do not hesitate to fork the project and extend the list of available ML models
or preprocessing methods by "plugging" your favorite ones following the package's architecture.
- [PyPI] v1.0.0