sklearn2sql-demo

Note : A final presentation is available here (pdf slides) : https://github.com/antoinecarme/presentations_slides/blob/main/sklearn2sql_presentation_2022-08.pdf

This repository contains some demos of the usage of sklearn2sql.

sklearn2sql is an ongoing development tool for generating deployment SQL code from scikit-learn objects.

Using sklearn2sql, it is possible to predict values from an already-fitted classifier or a regressor simply by executing some SQL code. It can be seen as an alternative to PMML-based methods to perform In-database processing.

(NEW) sklearn2sql is available as a RESTful web service on Heroku. A sample python client allows you to generate SQL from your own models. Your feedback is welcome.

The SQL code is produced in an agnostic way (the mechansim used does not depend on the database) and supports most widely used relational databases.

It is designed to support all classification and regression methods in scikit-learn (SVMs, linear models, naive-bayes. decision trees, MLP, etc) , as well as transformations (PCA, imputers, scalers), feature selection, outlier detection and and their derived objects (random forest, meta-estimators, pipelines, feature unions, ensembles, etc).

Roughly speaking, sklearn2sql allows one to translate a scikit learn model as a large, machine-friendly ;) SQL code that can later be executed on your favorite database. For example, this is a multilayer perceptron on oracle , and this is a random forest on postgresql ....

Extensions

Since the beginning of this project, some extensions have been added to support machine learning models built using tools similar to scikit-learn. The goal is to be able to generate the deployment SQL code for any kind of classification and regression model on any kind of SQL-capable database. These extensions share the same SQL generation layer used for scikit-learn.

A caret2sql project has been added to support R caret models. Some R jupyter notebook demos are available. It supports most used R machine learning models.
For deep learning models (neural network models), the keras2sql project has been added to support models built using the Keras framework with TensorFlow, Theano, and CNTK. Some demo python jupyter notebooks are available.
PyTorch Deep Learning models are also supported through pytorch2sql. Some demo python jupyter notebooks are available.
A similar generation process has been added for C++ backends through ml2cpp.
1. It generates a simple, readable C++ code that maps easily with the model structure. Facilitates debugging and integration.
2. The project uses the same low-level layers as sklearn2sql.
3. It supports all the models supported by the SQL backend.
4. It generates C++ code that can be executed on almost any hardware platform that has a serious C++ compiler (GCC welcome).
5. Some demo python jupyter notebooks are available.
6. The C++ code is even runnable on very small platforms (STM32, ESP32, Kendryte etc).
A Heroku-based web service can be used to generate SQL code for a given model. scikit-learn, keras and caret models are supported. SQL and C++ backends supported.
... (wip) ...

Supported Databases

Support for most popular relational databases has been added progressively. Now, sklearn2sql supports almost all the leading relational databases referenced on DB-Engines.

Open source databases : PostgreSQL (Just perfect !!!, most dervied database), MariaDB (contribued some CTE-related bugs for this project. Very reactive team. All bugs were fixed !!!!
Commercial databases : Oracle, MS SQL Server, IBM DB2, Teradata (to cover 95% of the market and get real-world tests)
Embedded databases : SQLite (even in-memory ;). Nice for prototyping, documentation and development. Zero config. Available everywhere (on Android and iOS devices and inside jupyter notebooks ;).
Hadoop databases : Hive and Impala
Other : Firebird (low memory footprint. A stress test ;) , Monetdb (columnar, a SQL quality reminder ;)

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
DuckDB_Support_2022-10-08		DuckDB_Support_2022-10-08
DuckDB_Support_2022-10-18/demo3		DuckDB_Support_2022-10-18/demo3
VeryLargeModelsSupport		VeryLargeModelsSupport
VeryLargeModelsSupport_temp_tables		VeryLargeModelsSupport_temp_tables
notebooks		notebooks
output_temp_tables		output_temp_tables
sample_outputs		sample_outputs
sample_outputs_round_2		sample_outputs_round_2
sample_outputs_round_3		sample_outputs_round_3
sample_outputs_round_4		sample_outputs_round_4
sample_outputs_round_5		sample_outputs_round_5
sample_outputs_round_6		sample_outputs_round_6
sample_outputs_round_7		sample_outputs_round_7
sample_outputs_round_8		sample_outputs_round_8
sample_outputs_round_9		sample_outputs_round_9
sample_outputs_tuning_round_1		sample_outputs_tuning_round_1
sample_outputs_tuning_round_2		sample_outputs_tuning_round_2
tests/mariadb/MDEV-13730		tests/mariadb/MDEV-13730
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sklearn2sql-demo

Extensions

Supported Databases

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

sklearn2sql-demo

Extensions

Supported Databases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages