Symbolic regression solver, based on genetic programming methodology.
- Description
1.1 Crossover
1.2 Mutation
1.3 Optimization of coefficients
1.4 Reducing complexity of syntax trees - Example
2.1 f(x,y,z) - ?
2.2 Try it in command line - Quick start
3.1 Just download jar
3.2 Try it with Maven
3.3 Hello world
Each mathematical expression can be represented in form of syntax tree:
Actually, it worth to keep in mind, that there exists infinite number of different syntax trees, which corresponds to semantically equivalent expressions. For example:
In practice, on of the most generic problems - is reconstruction of original function, having the information about its values in some specific points.
It is possible to apply genetic algorithm - for solving of given problem:
-
In terms of Genetic Algorithm - each syntax tree can be treated as a "chromosome" (an entity, which can "mutate" and change by "crossover" with other "chromosome")
-
It is needed to define fitness function: the function, which will calculate, how good each formula (which was encoded by syntax tree) - can represent existing data (e.g.: using mean squared error value).
During "crossover" - syntax tree is modified by substituion of its subtree, with some subtree from other syntax tree.
Following image explains implementation of "crossover" operation over syntax trees:
Currently implemented following "mutation" operations:
-
Substituion of some node of syntax tree - with node, which corresponds to different arithmetical operation:
-
Substituion of some subtree with randomly generated subtree:
Actually, some syntax tree might represent correct structure of searchable function, but due to some non-optimal values of coefficients - given syntax tree can be considered as non-optimal by fitness function.
For example, following image displays target values of searchable function (red crosses) - and two functions-candidates (green and blue):
Blue line has smaller value of mean squared error, but, actually - green parabola - would be a better candidate for the final solution.
By this reason, current implementation of Symbolic Regression Solver - uses second pass of Genetic Algorithm - for optimizing of coefficients of each syntax tree. On the picture below - represented the way, how coefficients of each syntax tree - could be transformed to "chromosome":
Lets try to reconstruct original function, by following target values.
x | y | z | f(x,y,z) |
---|---|---|---|
26 | 35 | 1 | 830 |
8 | 24 | -11 | 130 |
20 | 1 | 10 | 477 |
33 | 11 | 2 | 1217 |
37 | 16 | 7 | 1524 |
Below presented picture, which shows dynamics of changes of mean squared error, according to the best "evolved" syntax trees (the axis "x" - represents number of iteration):
- Download http://github.com/lagodiuk/genetic-programming/tree/master/bin/symbolic_regression_1.0.jar
- Create file xyz.txt with following content:
(this file can be downloaded from: https://github.com/lagodiuk/genetic-programming/blob/master/bin/xyz.txt)
# allowed functions are: ADD SUB MUL DIV SQRT POW LN SIN COS # set which functions to use: ADD MUL SUB # looking for: f(x, y, z) — ? # define training set: f(26, 35, 1) = 830 f(8, 24, -11) = 130 f(20, 1, 10) = 477 f(33, 11, 2) = 1217 f(37, 16, 7) = 1524
- Launch:
java -jar symbolic_regression_1.0.jar xyz.txt
- and observe output for each iteration (output will be in format: iteration numbre, value of mean squared error, and evolved formula).
The most simple way is download http://github.com/lagodiuk/genetic-programming/tree/master/bin/symbolic_regression_1.0.jar and add it to your classpath.
This project depends on Generic Genetic Algorithm project (as a maven dependency).
- git clone https://github.com/lagodiuk/genetic-algorithm.git
- git clone https://github.com/lagodiuk/genetic-programming.git
- mvn -f genetic-algorithm/pom.xml install
- mvn -f genetic-programming/pom.xml install
Now you can add following maven dependencies to your project:
<dependency>
<groupId>com.lagodiuk</groupId>
<artifactId>ga</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>com.lagodiuk</groupId>
<artifactId>gp</artifactId>
<version>1.0</version>
</dependency>
import java.util.LinkedList;
import java.util.List;
import com.lagodiuk.gp.symbolic.SymbolicRegressionEngine;
import com.lagodiuk.gp.symbolic.SymbolicRegressionIterationListener;
import com.lagodiuk.gp.symbolic.TabulatedFunctionFitness;
import com.lagodiuk.gp.symbolic.Target;
import com.lagodiuk.gp.symbolic.interpreter.Expression;
import com.lagodiuk.gp.symbolic.interpreter.Functions;
/**
* f(x) - ? <br/>
*
* f(0) = 0 <br/>
* f(1) = 11 <br/>
* f(2) = 24 <br/>
* f(3) = 39 <br/>
* f(4) = 56 <br/>
* f(5) = 75 <br/>
* f(6) = 96 <br/>
*
* (target function is f(x) = x^2 + 10*x)
*/
public class HelloSymbolicRegression {
public static void main(String[] args) {
// define training set
TabulatedFunctionFitness fitness =
new TabulatedFunctionFitness(
new Target().when("x", 0).targetIs(0),
new Target().when("x", 1).targetIs(11),
new Target().when("x", 2).targetIs(24),
new Target().when("x", 3).targetIs(39),
new Target().when("x", 4).targetIs(56),
new Target().when("x", 5).targetIs(75),
new Target().when("x", 6).targetIs(96));
SymbolicRegressionEngine engine =
new SymbolicRegressionEngine(
fitness,
// define variables
list("x"),
// define base functions
list(Functions.ADD, Functions.SUB, Functions.MUL, Functions.VARIABLE, Functions.CONSTANT));
addListener(engine);
// 200 iterations
engine.evolve(200);
}
/**
* Track each iteration
*/
private static void addListener(SymbolicRegressionEngine engine) {
engine.addIterationListener(new SymbolicRegressionIterationListener() {
@Override
public void update(SymbolicRegressionEngine engine) {
Expression bestSyntaxTree = engine.getBestSyntaxTree();
double currFitValue = engine.fitness(bestSyntaxTree);
// log to console
System.out.println(
String.format("iter = %s \t fit = %s \t func = %s",
engine.getIteration(), currFitValue, bestSyntaxTree.print()));
// halt condition
if (currFitValue < 5) {
engine.terminate();
}
}
});
}
private static <T> List<T> list(T... items) {
List<T> list = new LinkedList<T>();
for (T item : items) {
list.add(item);
}
return list;
}
}
Some additional information can be found in the following article (in Russian language).