Add utility module for calibrating impact functions

This issue collects ideas for an impact function calibration module

## Input information

For calibrating any impact function, we need to specify four main things:

* The impact data. This is the "real world" data we want to use to calibrate the impact function. The computed ("modeled") impact should be as close to it as possible.
* The impact function. We have to specify what its overall shape is and how it is parameterized.
* The cost function. It relates the modeled impact with the impact data and returns a scalar that indicates how close they are. The task for calibrating an impact function is reduced to minimizing the cost function.
* The impact estimation task. As the impact function is varied by the optimization algorithm, we "only" need to supply the usual Hazard and Exposure.

## Calibration with Bayesian Optimization

This idea is based on the Petals `calib_opt` module by @timschmi95, see https://github.com/CLIMADA-project/climada_petals/blob/feature/calib_opt/climada_petals/engine/calib_opt.py

The module wants us to instantiate a `BayesianOptimization` instance. Its main parameter is a function that has the search space parameters as input and returns the target function value. In this case, the target function value is *maximized*, which simply means that we have to invert the cost function value.

With regard to the impact data I would take a simplistic approach. To accomodate multiple events and multiple regions, the input data should be a `pandas.DataFrame` with the `event_id` and index and arbitrary (!) columns that can (!) represent the target impact values. The cost function then takes this data frame and the impact object of each iteration and the user can define how the cost should then be computed from both objects. In this case, we assume that the columns in the impact data are impacts per region. We compute the difference for each region and event, square all results, take the log for the sum of all regions and then sum the result of all events (as an overly complicated example for what we can do with this):

```python
def my_cost_function(data: pd.DataFrame, impact: climada.engine.Impact) -> float:
    return np.sum(np.log10(((data - impact.impact_at_reg())**2).sum(axis=1)))
```

The other important input it the impact function. Its parameters are what the optimizer calibrates in order to minimize the cost function. The user has to supply a "generator" function that takes the set of calibrated parameters and returns a new impact function set. In this example, we take the TC impact function by Emanuel 2011 and intend to vary two of the three parameters:

```python
def my_impf_set_gen(v_thresh: float, v_half: float) -> climada.entity.ImpactFuncSet:
    return climada.engine.ImpactFuncSet([
        climada.entity.ImpfTropCyclone.from_emanuel_usa(
            v_thresh=v_thresh, v_half=v_half
        )
    ])
```

Now we only need a utility function to plug everything together:

```python
from bayes_opt import BayesianOptimization

def calibrate(
    hazard: Hazard,
    exposure: Exposures,
    impact_data: pd.DataFrame,
    impact_func_gen: Callable[..., ImpactFuncSet],
    cost_func: Callable[[pd.DataFrame, Impact], float],
    param_bounds: Mapping[str, Tuple[float, float]],
    optimizer_kwds: Mapping[str, Any],
    impact_calc_kwds: Mapping[str, Any],
):
    """Calibrate parameters for an impact function"""
    # Match event IDs in hazard and impact_data
    hazard = hazard.select(event_id=impact_data.index)

    # Assign the centroids
    exposure.assign_centroids(hazard)

    # Create the calibration function
    def func(**kwargs):
        impf_set = impact_func_gen(**kwargs)
        impact = ImpactCalc(exposure, impf_set, hazard).impact(
            assign_centroids=False, **impact_calc_kwds
        )
        return 1 / cost_func(impact_data, impact)

    # Create the optimizer (with some default settings)
    optimizer_opts = dict(random_state=1, allow_duplicate_points=True)
    optimizer_opts.update(**optimizer_kwds)
    optimizer = BayesianOptimization(f=func, pbounds=param_bounds, **optimizer_opts)

    return optimizer
```

The user can then call the optimizer like advertised by the BayesianOptimization docs. We assume we have a hazard, an exposure, and the impact data defined somwhere. Then we only need to define the bounds for the parameters we are looking for. The BayesianOptimization package automatically deduces the parameter names and initial values from these automatically.

```python
optimizer = calibrate(
    hazard=hazard,
    exposure=exposure,
    impact_data=impact_data,
    impact_func_gen=my_impf_set_gen,
    cost_func=my_cost_function,
    # NOTE: Bad example. These parameters must be constrained because v_thresh < v_half is required!
    param_bounds={"v_half": (0, 100), "v_thresh": (0, 50)},  
    optimizer_kwds=dict(verbose=2),
    impact_calc_kwds=dict(save_mat=True),  # To be able to call impact.imact_at_reg()
)
optimizer.maximize(init_points=100, n_iter=200)
result = optimizer.max["params"]

# Use generator to receive the function we want
result_impf_set = my_impf_set_gen(result)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add utility module for calibrating impact functions #680

Input information

Calibration with Bayesian Optimization

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add utility module for calibrating impact functions #680

Description

Input information

Calibration with Bayesian Optimization

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions