This python script runs a comparison of several machine learning models on a dataset and outputs the accuracy of the model, the time employed for training and the CO2 emissions generated by the process. CO2 emissions are calculated using CodeCarbon.
An example of the measures obtained for the included datasets can be seen on the output file.
The program has been tested on Ubuntu 20.04 with Python 3.8. Necessary system packages:
python3.8-tk (GUI for matplotlib graphs)
stress (simulated stress conditions for CPU tests)
- Clone repository and change directory to the created folder
- Create virtual environment for managing dependencies
python3 -m venv venv
- Activate the environment
source venv/bin/activate
- Install packages with PIP
pip install -r requirements.txt
- Install the MLCost package in editable mode (to make changes)
pip install -e .
Run the program without any options to use the default Iris dataset.
There are several larger datasets available for testing in the data folder:
- To run the program with a specific dataset, use the
-d
option. - To specify a separate file with the test data, use the
-t
option. - If the dataset uses a different format from a standard comma-separated values, use the
-s
to specify the separator.
mlcost measure -d data/adult/adult.data -t data/adult/adult.test -s ", "
mlcost measure -d data/hepatitis/hepatitis.data