Skip to content

Commit b62d8d2

Browse files
committed
Update README.md
1 parent 77d8970 commit b62d8d2

File tree

1 file changed

+55
-3
lines changed

1 file changed

+55
-3
lines changed

README.md

+55-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,60 @@
11
genetic-programming
22
===================
33

4-
Symbolic regression solver, based on genetic programming methodology.
5-
More info in this [article](http://habrahabr.ru/post/163195/)
4+
[Symbolic regression](http://en.wikipedia.org/wiki/Symbolic_regression) solver, based on [genetic programming](http://en.wikipedia.org/wiki/Genetic_programming) methodology.
5+
6+
## Description ##
7+
8+
Each mathematical expression can be represented in form of syntax tree: <br/>
9+
![Syntax Tree Example](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/syntax_tree.png)
10+
11+
Actually, it worth to keep in mind, that there exists infinite number of different syntax trees, which corresponds to semantically equivalent expressions. For example: <br/>
12+
![Equivalent Syntax Trees](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/equiv_syntax_trees.png)
13+
14+
In practice, on of the most generic problems - is reconstruction of original function, having the information about its values in some specific points.
15+
16+
It is possible to apply [genetic algorithm](http://en.wikipedia.org/wiki/Genetic_algorithm) - for solving of given problem:
17+
18+
1. In terms of Genetic Algorithm - each syntax tree can be treated as a "chromosome" (an entity, which can "mutate" and change by "crossover" with other "chromosome")
19+
20+
2. It is needed to define [fitness function](http://en.wikipedia.org/wiki/Fitness_function): the function, which will calculate, how good each formula (which was encoded by syntax tree) - can represent existing data (e.g.: using [mean squared error](http://en.wikipedia.org/wiki/Mean_squared_error) value).
21+
22+
### Crossover ###
23+
During "crossover" - syntax tree is modified by substituion of its subtree, with some subtree from other syntax tree.
24+
25+
Following image explains implementation of "crossover" operation over syntax trees: <br/>
26+
![Crossover](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/crossover.png)
27+
28+
### Mutation ###
29+
Currently implemented following "mutation" operations:
30+
31+
1. Substituion of some node of syntax tree - with node, which corresponds to different arithmetical operation: <br/>
32+
![Mutation - by substitution of arithmetical operation](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/mutation_1.png)
33+
34+
2. Substituion of some subtree with randomly generated subtree: <br/>
35+
![Mutation - by substituion of some subtree with randomly generated subtree](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/mutation_2.png)
36+
37+
3. Removing of some intermediate node from syntax tree: <br/>
38+
![Mutation - by removing of some intermediate node](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/mutation_3.png)
39+
40+
4. Expanding tree from root: <br/>
41+
![Mutation - by expanding tree from root](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/mutation_4.png)
42+
43+
5. Swapping subtrees for non-commutative oparations: <br/>
44+
![](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/mutation_5.png)
45+
46+
### Optimization of coefficients ###
47+
Actually, some syntax tree might represent correct structure of searchable function, but due to some non-optimal values of coefficients - given syntax tree can be considered as non-optimal by fitness function.
48+
49+
For example, following image displays target values of searchable function (red crosses) - and two functions-candidates (green and blue): <br/>
50+
![Why optimization of coefficients is needed](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/why_should_optimize_coefficients.png)
51+
52+
Blue line has smaller value of mean squared error, but, actually - green parabola - would be a better candidate for the final solution.
53+
54+
By this reason, current implementation of Symbolic Regression Solver - uses second pass of Genetic Algorithm - for optimizing of coefficients of each syntax tree. On the picture below - represented the way, how coefficients of each syntax tree - could be transformed to "chromosome": <br/>
55+
![Encoding coefficients of syntax tree into chromosome](https://raw.githubusercontent.com/lagodiuk/genetic-programming/master/img/optimize_coefficients_ga.png)
56+
57+
More info in this [article](http://habrahabr.ru/post/163195/) (article is in Russian language).
658

759
This project depends on [Generic Genetic Algorithm project](https://github.com/lagodiuk/genetic-algorithm) (has a maven dependency)
860

@@ -122,4 +174,4 @@ public class HelloSymbolicRegression {
122174
return list;
123175
}
124176
}
125-
```
177+
```

0 commit comments

Comments
 (0)