Skip to content

symbolic_regression_part4

Manlio Morini edited this page Jun 26, 2024 · 28 revisions

Symbolic regression - Custom evaluator and teams

Evolving multiple programs at the same time is great but my problem requires multiple variables. How should I proceed?

Preliminary note

If you only need multiple variables (without multiple programs) src_search is enough. Stop reading and turn back to wiki / source.

In general try to use src_search because it directly supports model metrics and validation strategies.

Complex problem

$$ \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix} = \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{nn} \end{bmatrix} \cdot \begin{pmatrix} \boldsymbol{f_1}(x_1,x_2,x_3) \\ \boldsymbol{f_2}(x_1,x_2,x_3) \\ \vdots \\ \boldsymbol{f_n}(x_1,x_2,x_3) \end{pmatrix} $$

The case of multiple variables and multiple programs cannot be supported in a unique way and the user is forced to customize the generic search class to match his requirements.

Setting up code

A painstaking extension of the previous example is technically viable but we have a better option.

Instead of a user-defined terminal (c), we can use the predefined vita::variable terminal. Variables are convenient placeholders filled at the beginning of program/individual execution with user-provided values.

In the main() function:

prob.sset.insert<c>();

has been replaced with:

prob.sset.insert<vita::variable>("x1", 0);
prob.sset.insert<vita::variable>("x2", 1);
prob.sset.insert<vita::variable>("x3", 2);

The constructor of a variable takes two parameters:

  1. the name of the variable (e.g. "x1");
  2. an index used to retrieve the value of the variable at execution time (e.g. 0). More about this point follows below.

A training case / example can be represented with a simple structure:

struct example
{
  example(const std::vector<double> &ex_a, const vita::matrix<double> &ex_b,
          const std::vector<double> &ex_x)
    : a(ex_a), b(ex_b), x()
  {
    std::copy(ex_x.begin(), ex_x.end(), std::back_inserter(x));
  }

  std::vector<double>        a;
  vita::matrix<double>       b;
  std::vector<vita::value_t> x;
};

x contains the value of the variables for a given example (x[i] is the value of the i-th variable).

Our problem crunches real numbers so the constructor takes vectors of doubles.

Vita however tries to support many use-cases adopting vita::value_t for storing / passing values. This forces a conversion from a vector of doubles (ex_x) to a vector of value_ts (x).

std::copy performs the conversion once and for all (delaying the conversion at parameter-passing-time is less efficient).


The training set is a collection of examples:

using training_set = std::vector<example>;

Almost every iterable container could be used (e.g. std::list instead of std::vector).


Now we can take advantage of the existing sum_of_errors_evaluator (see src/evaluator.h) class to quickly write our evaluator.

sum_of_errors_evaluator is a template class that, given an error functor (ERRF) and a training set (DAT):

  • calculates the sum of the errors of a model/program over the training set;
  • converts the total error in a standardized fitness.
template<class T, class ERRF, class DAT>
class sum_of_errors_evaluator : public src_evaluator<T, DAT>
{
public:
  static_assert(std::is_class_v<ERRF>);
  static_assert(detail::is_iterable_v<DAT>);
  static_assert(detail::is_error_functor_v<ERRF, DAT>);

  explicit sum_of_errors_evaluator(DAT &);

  fitness_t operator()(const T &) override;

  // ...
};

The error functor object (ERRF) acquires a program via its constructor and calculates the error on a specific example:

class error_functor
{
public:
  error_functor(const PROGRAM &);

  double operator()(const EXAMPLE &) const;

  // ...
};

Implementing ERRF::operator() isn't hard since the code from the previous example is already good:

class error_functor
{
public:
  error_functor(const candidate_solution &s) : s_(s) {}

  double operator()(const example &ex) const
  {
    std::vector<double> f(N);
    std::transform(s_.begin(), s_.end(), f.begin(),
                   [&ex](const auto &i)
                   {
                     const auto ret(vita::run(i, ex.x));

                     return vita::has_value(ret) ? std::get<vita::D_DOUBLE>(ret)
                                                 : 0.0;
                   });

    std::vector<double> model(N, 0.0);
    for (unsigned i(0); i < N; ++i)
      for (unsigned j(0); j < N; ++j)
        model[i] += ex.b(i, j) * f[j];

    double delta(std::inner_product(ex.a.begin(), ex.a.end(),
                                    model.begin(), 0.0,
                                    std::plus<>(),
                                    [](auto v1, auto v2)
                                    {
                                      return std::fabs(v1 - v2);
                                    }));

    return delta;
  }

private:
  candidate_solution s_;
};

Two important remarks are:

  • vita::run(i) has been changed with vita::run(i, ex.x) thus enabling the passage of values from the training case to the variables;
  • the functor returns delta directly, leaving to sum_of_errors_evaluator the burden of the conversion to a standardized fitness.

(for your ease all the code is in the examples/symbolic_regression05.cc file)

Clone this wiki locally