Name		Name	Last commit message	Last commit date
Latest commit History 327 Commits
.github/workflows		.github/workflows
docs		docs
experiment		experiment
postprocessing		postprocessing
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
configure.sh		configure.sh
environment.yml		environment.yml
install.sh		install.sh
setup.py		setup.py

Repository files navigation

SRBench: A Living Benchmark for Symbolic Regression

The methods for symbolic regression (SR) have come a long way since the days of Koza-style genetic programming (GP). Our goal with this project is to keep a living benchmark of modern symbolic regression, in the context of state-of-the-art ML methods.

Currently these are the challenges, as we see it:

Lack of cross-pollination between the GP community and the ML community (different conferences, journals, societies etc)
Lack of strong benchmarks in SR literature (small problems, toy datasets, weak comparator methods)
Lack of a unified framework for SR, or GP

We are addressing the lack of pollination by making these comparisons open source, reproduceable and public, and hoping to share them widely with the entire ML research community. We are trying to address the lack of strong benchmarks by providing open source benchmarking of many SR methods on large sets of problems, with strong baselines for comparison. To handle the lack of a unified framework, we've specified minimal requirements for contributing a method to this benchmark: a scikit-learn compatible API.

Results

Browse the Current Results

This benchmark currently consists of 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB, including real-world and synthetic datasets from processes with and without ground-truth models.

Methods currently benchmarked:

Age-Fitness Pareto Optimization (Schmidt and Lipson 2009) paper , code
Age-Fitness Pareto Optimization with Co-evolved Fitness Predictors (Schmidt and Lipson 2009) paper , code
AIFeynman 2.0 (Udrescu et al. 2020) paper , code
Bayesian Symbolic Regression (Jin et al. 2020) paper , code
Deep Symbolic Regression (Petersen et al. 2020) paper , code
Fast Function Extraction (McConaghy 2011) paper , code
Feature Engineering Automation Tool (La Cava et al. 2017) paper , code
epsilon-Lexicase Selection (La Cava et al. 2016) paper , code
GP-based Gene-pool Optimal Mixing Evolutionary Algorithm (Virgolin et al. 2017) paper , code
gplearn (Stephens) code
Interaction-Transformation Evolutionary Algorithm (de Franca and Aldeia, 2020) paper , code
Multiple Regression GP (Arnaldo et al. 2014) paper , code
Operon (Burlacu et al. 2020) paper , code
Semantic Backpropagation GP (Virgolin et al. 2019) paper , code

Contribute

We are actively updating and expanding this benchmark. Want to add your method? See our Contribution Guide.

How to run

Installation

We have provided a conda environment, configuration script and installation script that should make installation straightforward. We've currently tested this on Ubuntu and CentOS. Steps:

Install the conda environment:

conda env create -f environment.yml
conda activate srbench

Install the benchmark methods:

bash install.sh

Checkout the feynman PMLB branch (once these new datasets are merged, you will be able to skip this step):

git clone -b feynman https://github.com/EpistasisLab/pmlb/ [/path/to/pmlb/]
cd /path/to/pmlb
git lfs fetch

Start the benchmark

Experiments are launched from the experiments/ folder via the script analyze.py. The script can be configured to run the experiment in parallel locally, on an LSF job scheduler, or on a SLURM job scheduler. To see the full set of options, run python analyze.py -h.

After installing and configuring the conda environment, the complete black-box experiment can be started via the command:

python analyze.py /path/to/pmlb/datasets -n_trials 10 -results ../results -time_limit 48:00

Similarly, the ground-truth regression experiment for strogatz datasets and a target noise of 0.0 are run by the command:

python analyze.py -results ../results_sym_data -target_noise 0.0 "/path/to/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned

Cite

A pre-print of the current version of the benchmark is available:

La Cava, W., Orzechowski, P., Burlacu, B., de França, F. O., Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. (2021). Contemporary Symbolic Regression Methods and their Relative Performance. Preprint

v1.0 was reported in our GECCO 2018 paper:

Orzechowski, P., La Cava, W., & Moore, J. H. (2018). Where are we now? A large benchmark study of recent symbolic regression methods. GECCO 2018. DOI, Preprint

Contact

William La Cava (@lacava), lacava at upenn dot edu

Patryk Orzechowski (@athril), patryk dot orzechowski at gmail dot com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRBench: A Living Benchmark for Symbolic Regression

Results

Contribute

How to run

Installation

Start the benchmark

Cite

Contact

About

Releases 2

Packages

Contributors 13

Languages

License

cavalab/srbench

Folders and files

Latest commit

History

Repository files navigation

SRBench: A Living Benchmark for Symbolic Regression

Results

Contribute

How to run

Installation

Start the benchmark

Cite

Contact

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 13

Languages

Packages