Skip to content

TobiSan5/sklearn2excel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sklearn2excel

Bringing Scikit-learn decision trees to Excel

With this Python package, one can make a trained machine learning model accessible to others without having to deploy it as a service. More specifically, one can export a Scikit-learn decision tree or random forest model to a Excel workbook. All decision chains in the model will be represented within a single table and feature values can be tested for an average prediction.

Project overview

Version: 0.1.1

  • package level
    • export_to_xlsx() (main access point)
    • export_to_textfile() (alternative use)
      • detects maximum tree depth and applies this parameter
  • helpers module
    • create_xlfile (project internal)
      • writes a DecisionTreeTable object to a Excel sheet
      • writes features and an initial value of 1 to front sheet
      • writes decision trees to 2nd sheet
  • core module
    • class DecisionTreeTable (project internal)
      • a class that can be instantiated with a parsed text file
      • transforms and represent decisions trees in a datastructure
      • exposed properties to access info about the structure
      • exposed methods to get tests and results as indexed rows
      • handle classifier- and regressor-type decision trees
  • TODO:
    • thoroughly testing (75%)

Installation

pip install sklearn2excel

Installation will install scikit-learn and XlsxWriter as well.

Usage example

from pathlib import Path
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import sklear2excel as s2e


# fetch Scikit-learn wine example data as
# sklearn.utils.Bunch object
# and prepare example model from
# sklearn.ensemble.RandomForestClassifier
# RandomForestRegressor or any classifier/regressor
# subtype of BaseDecisionTree could be used
bunch = s2e.get_data_target_and_features()
wine_data = bunch.data
wine_target = bunch.target
wine_features = bunch.feature_names[:4]
X = wine_data[wine_features]
y = LabelEncoder().fit_transform(wine_target)
clf_model = RandomForestClassifier(
  n_estimators=10, 
  min_samples_leaf=2
).fit(X, y)

path_xlsx = Path.cwd() / "excel_output.xlsx"
path_txt = Path.cwd() / "text_output.txt"

# export model as text file with use of 
# sklearn export function
# first param single or ensemble of decision trees
s2e.export_to_textfile(
  clf_model.estimators_,  # ensemble of decision trees
  path_txt,
  wine_features
)

# export model as Excel file
# features written to Front sheet with initial value 1.0
# decision trees written to 2nd sheet
s2e.export_to_xlsx(
  clf_model.estimators_,
  wine_features,
  path_xlsx
)

Development setup

  • Flit ~3.4

Release History

  • 0.1.1
    • FIX: XlsxWriter dependency corrected
  • 0.1.0
    • First proper release
    • NEW: direct function export_to_xlsx()
    • CHANGE: functions and class available at package-level
  • 0.0.1
    • Work in progress

Meta

Torbjørn Wikestad – @TWikestadtorbjorn.wikestad@gmail.com

Distributed under the MIT license. See LICENSE for more information.

https://github.com/tobisan5/github-link

Contributing

  1. Fork it (https://github.com/tobisan5/sklearn2excel/fork)
  2. Create your feature branch (git checkout -b feature/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin feature/fooBar)
  5. Create a new Pull Request

About

A module to use with Scikit-learn to support the process of exporting trained models to Excel.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages