scikit-learn_bench

scikit-learn_bench benchmarks various implementations of machine learning algorithms across data analytics frameworks. Scikit-learn_bench can be extended to add new frameworks and algorithms. It currently support the scikit-learn, DAAL4PY, cuML, and XGBoost frameworks for commonly used machine learning algorithms.

See benchmark results here.

Table of content

Prerequisites
How to create conda environment for benchmarking
How to enable daal4py patching for scikit-learn benchmarks
Running Python benchmarks with runner script
Supported algorithms
Algorithms parameters
Legacy automatic building and running

Prerequisites

python and scikit-learn to run python versions
pandas when using its DataFrame as input data format
icc, ifort, mkl, daal to compile and run native benchmarks
machine learning frameworks, that you want to test. Check this item to get additional information how to set environment.

How to create conda environment for benchmarking

Create a suitable conda environment for each framework to test. Each item in the list below links to instructions to create an appropriate conda environment for the framework.

How to enable daal4py patching for scikit-learn benchmarks

Set specific environment variable export FORCE_DAAL4PY_SKLEARN=YES

Running Python benchmarks with runner script

Run python runner.py --configs configs/config_example.json [--output-format json --verbose] to launch benchmarks.

runner options:

configs : configuration files paths
dummy-run : run configuration parser and datasets generation without benchmarks running
verbose : print additional information during benchmarks running
output-format: json or csv. Output type of benchmarks to use with their runner

Benchmarks currently support the following frameworks:

scikit-learn
daal4py
cuml
xgboost

The configuration of benchmarks allows you to select the frameworks to run, select datasets for measurements and configure the parameters of the algorithms.

You can configure benchmarks by editing a config file. Check config.json schema for more details.

Benchmark supported algorithms

algorithm	benchmark name	sklearn	daal4py	cuml	xgboost
DBSCAN	dbscan	✅	✅	✅	❌
RandomForestClassifier	df_clfs	✅	✅	✅	❌
RandomForestRegressor	df_regr	✅	✅	✅	❌
pairwise_distances	distances	✅	✅	❌	❌
KMeans	kmeans	✅	✅	✅	❌
KNeighborsClassifier	knn_clsf	✅	❌	✅	❌
LinearRegression	linear	✅	✅	✅	❌
LogisticRegression	log_reg	✅	✅	✅	❌
PCA	pca	✅	✅	✅	❌
Ridge	ridge	✅	✅	✅	❌
SVM	svm	✅	✅	✅	❌
train_test_split	train_test_split	✅	❌	✅	❌
GradientBoostingClassifier	gbt	❌	❌	❌	✅
GradientBoostingRegressor	gbt	❌	❌	❌	✅

Algorithms parameters

You can launch benchmarks for each algorithm separately. To do this, go to the directory with the benchmark:

cd <framework>

Run the following command:

python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters>

The list of supported parameters for each algorithm you can find here:

Legacy automatic building and running

Run make. This will generate data, compile benchmarks, and run them.
- To run only scikit-learn benchmarks, use make sklearn.
- To run only native benchmarks, use make native.
- To run only daal4py benchmarks, use make daal4py.
- To run a specific implementation of a specific benchmark, directly request the corresponding file: make output/<impl>/<bench>.out.
- If you have activated a conda environment, the build will use daal from the conda environment, if available.

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
configs		configs
cuml		cuml
daal4py		daal4py
modelbuilders		modelbuilders
native		native
sklearn		sklearn
xgboost		xgboost
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
LICENSES_bundled		LICENSES_bundled
Makefile		Makefile
README.md		README.md
make_datasets.py		make_datasets.py
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scikit-learn_bench

Table of content

Prerequisites

How to create conda environment for benchmarking

How to enable daal4py patching for scikit-learn benchmarks

Running Python benchmarks with runner script

Benchmark supported algorithms

Algorithms parameters

Legacy automatic building and running

About

Uh oh!

Releases

Packages

Languages

License

Pahandrovich/scikit-learn_bench

Folders and files

Latest commit

History

Repository files navigation

scikit-learn_bench

Table of content

Prerequisites

How to create conda environment for benchmarking

How to enable daal4py patching for scikit-learn benchmarks

Running Python benchmarks with runner script

Benchmark supported algorithms

Algorithms parameters

Legacy automatic building and running

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages