Lambda Rewards #123

AntoinePrv · 2021-01-28T23:07:21Z

Describe the problem or improvement suggested

Many simple reward functions have the following pattern

class SomeReward {

    void before_reset(...) {
        metric = 0;
    }

    double extract(...) {
        auto new_value = SCIPsomething(...);
        auto diff = new_value - metric;
        metric = new_value;
        return diff;
    }

private:
    type metric = 0;
};

Which get redundant and could be automated.

Describe the solution you would like

A LambdaDiffReward function that automates the following, where the type of metric is computed from the return type of SCIPsomething.

Describe alternatives you have considered

Interestingly, when we have SomeReward, we can get back to SCIPsomething by using reward.cumsum().
In Numpy, there is also a np.diff that does the opposite of np.cumsum, i.e. compute the finite difference.

Perhaps, we could have a LambdaReward that does not compute differences and simply return the output

class LambdaReward {

    double extract(...) { return SCIPsomething(...); }

};

As well as a reward.diff() that build the difference.

Additional context

We should be consistent on whether reward functions compute finite differences or not.
Using .cumsum() .diff() we can easily switch from one to the other.

The text was updated successfully, but these errors were encountered:

AntoinePrv added the type/enhancement 🚀 New feature or request label Jan 28, 2021

AntoinePrv added new/reward 🌟 Ecole reward function and removed type/enhancement 🚀 New feature or request labels Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lambda Rewards #123

Lambda Rewards #123

AntoinePrv commented Jan 28, 2021 •

edited

Loading

Lambda Rewards #123

Lambda Rewards #123

Comments

AntoinePrv commented Jan 28, 2021 • edited Loading

Describe the problem or improvement suggested

Describe the solution you would like

Describe alternatives you have considered

Additional context

AntoinePrv commented Jan 28, 2021 •

edited

Loading