Description
This issue collects ideas for an impact function calibration module
Input information
For calibrating any impact function, we need to specify four main things:
- The impact data. This is the "real world" data we want to use to calibrate the impact function. The computed ("modeled") impact should be as close to it as possible.
- The impact function. We have to specify what its overall shape is and how it is parameterized.
- The cost function. It relates the modeled impact with the impact data and returns a scalar that indicates how close they are. The task for calibrating an impact function is reduced to minimizing the cost function.
- The impact estimation task. As the impact function is varied by the optimization algorithm, we "only" need to supply the usual Hazard and Exposure.
Calibration with Bayesian Optimization
This idea is based on the Petals calib_opt
module by @timschmi95, see https://github.com/CLIMADA-project/climada_petals/blob/feature/calib_opt/climada_petals/engine/calib_opt.py
The module wants us to instantiate a BayesianOptimization
instance. Its main parameter is a function that has the search space parameters as input and returns the target function value. In this case, the target function value is maximized, which simply means that we have to invert the cost function value.
With regard to the impact data I would take a simplistic approach. To accomodate multiple events and multiple regions, the input data should be a pandas.DataFrame
with the event_id
and index and arbitrary (!) columns that can (!) represent the target impact values. The cost function then takes this data frame and the impact object of each iteration and the user can define how the cost should then be computed from both objects. In this case, we assume that the columns in the impact data are impacts per region. We compute the difference for each region and event, square all results, take the log for the sum of all regions and then sum the result of all events (as an overly complicated example for what we can do with this):
def my_cost_function(data: pd.DataFrame, impact: climada.engine.Impact) -> float:
return np.sum(np.log10(((data - impact.impact_at_reg())**2).sum(axis=1)))
The other important input it the impact function. Its parameters are what the optimizer calibrates in order to minimize the cost function. The user has to supply a "generator" function that takes the set of calibrated parameters and returns a new impact function set. In this example, we take the TC impact function by Emanuel 2011 and intend to vary two of the three parameters:
def my_impf_set_gen(v_thresh: float, v_half: float) -> climada.entity.ImpactFuncSet:
return climada.engine.ImpactFuncSet([
climada.entity.ImpfTropCyclone.from_emanuel_usa(
v_thresh=v_thresh, v_half=v_half
)
])
Now we only need a utility function to plug everything together:
from bayes_opt import BayesianOptimization
def calibrate(
hazard: Hazard,
exposure: Exposures,
impact_data: pd.DataFrame,
impact_func_gen: Callable[..., ImpactFuncSet],
cost_func: Callable[[pd.DataFrame, Impact], float],
param_bounds: Mapping[str, Tuple[float, float]],
optimizer_kwds: Mapping[str, Any],
impact_calc_kwds: Mapping[str, Any],
):
"""Calibrate parameters for an impact function"""
# Match event IDs in hazard and impact_data
hazard = hazard.select(event_id=impact_data.index)
# Assign the centroids
exposure.assign_centroids(hazard)
# Create the calibration function
def func(**kwargs):
impf_set = impact_func_gen(**kwargs)
impact = ImpactCalc(exposure, impf_set, hazard).impact(
assign_centroids=False, **impact_calc_kwds
)
return 1 / cost_func(impact_data, impact)
# Create the optimizer (with some default settings)
optimizer_opts = dict(random_state=1, allow_duplicate_points=True)
optimizer_opts.update(**optimizer_kwds)
optimizer = BayesianOptimization(f=func, pbounds=param_bounds, **optimizer_opts)
return optimizer
The user can then call the optimizer like advertised by the BayesianOptimization docs. We assume we have a hazard, an exposure, and the impact data defined somwhere. Then we only need to define the bounds for the parameters we are looking for. The BayesianOptimization package automatically deduces the parameter names and initial values from these automatically.
optimizer = calibrate(
hazard=hazard,
exposure=exposure,
impact_data=impact_data,
impact_func_gen=my_impf_set_gen,
cost_func=my_cost_function,
# NOTE: Bad example. These parameters must be constrained because v_thresh < v_half is required!
param_bounds={"v_half": (0, 100), "v_thresh": (0, 50)},
optimizer_kwds=dict(verbose=2),
impact_calc_kwds=dict(save_mat=True), # To be able to call impact.imact_at_reg()
)
optimizer.maximize(init_points=100, n_iter=200)
result = optimizer.max["params"]
# Use generator to receive the function we want
result_impf_set = my_impf_set_gen(result)