Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add module for calibrating impact functions #692

Merged
merged 87 commits into from
Jul 12, 2024

Conversation

peanutfun
Copy link
Member

@peanutfun peanutfun commented Apr 6, 2023

Add a generalized (!) module for calibrating impact functions.

  • Base module defining data structures to derive from for building your own impact function calibrator
  • Calibration module based on scipy.minimize.
  • Calibration module based on bayesian-optimization package.
  • Helper structures to evaluate and visualize the calibration results.
  • Calibration tutorial based on BayesianOptimizer.
  • Unit and integration tests.

This PR fixes #680

PR Author Checklist

PR Reviewer Checklist

# Conflicts:
#	script/jenkins/branches/Jenkinsfile
#	tests_runner.py
@peanutfun
Copy link
Member Author

Personal conversation with @chahank:

  • We want functions for plotting and evaluating the output
  • Move module to climada/engine
  • Add default cost functions
  • Streamline Input: Differences between inputs entering ScipyMinimizeOptimizer and BayesianOptimizer should be minimal
  • Regional aggregations of the impact might be costly when calculating impacts for the entire world. It would be great if we could use aggregation information beforehand, and chunk hazard and exposures accordingly. However, this problem is not exclusive to calibration, and might arise for any impact calculation. @chahank will try to find a solution in the ImpactCalc class. If he succeeds, we will try to adapt the calibration classes.

@bguillod
Copy link
Collaborator

Hi,
Is there any plan to calibrate impact functions based on several hazards? To be more specific: I am here thinking that the sum of impacts from both TC wind and surge should be calibrated to reported damages for TC events, as opposed to calibrating wind only and assuming wind is a proxy for everything.

@chahank
Copy link
Member

chahank commented Jun 28, 2023

This might come in the future. It is not so trivial to integrate. Maybe there first will have to be a MultiImpactCalc module. Please feel free to contribute if you would like to see more features.

Use negative cost function as target function in BayesianOptimizer

Co-authored-by: Thomas Vogt <57705593+tovogt@users.noreply.github.com>
@peanutfun
Copy link
Member Author

After another discussion with @chahank:

We will support lists of hazard and exposure objects as inputs. These lists must have the same length. If lists are given as input, the cost function will receive a list of corresponding impact objects. Users will have to adapt their cost functions accordingly

@ChrisFairless
Copy link
Collaborator

Hey! Some useful context, just in case you've not looked at this already.

I ran the tutorial and plotted the calibrated impact function for the NA1 region against the functions given by the out-of-the-box Emanuel and the regionally-calibrated Eberenz functions:

image

It's quite a change. Any idea why? Looking at the input data loaded at the start of the analysis, it hasn't been changed for at least three years, so I assume it dates back to the Eberenz calibrations. Given the uncertainty shown in those original calibrations (see below and Fig 5 of https://nhess.copernicus.org/articles/21/393/2021/ ) I assume it's entirely due to the cost function?

image

Code to add these curves:

from climada.util.calibrate import BayesianOptimizerOutputEvaluator, select_best
from climada.entity import ImpactFuncSet, ImpfTropCyclone, ImpfSetTropCyclone
from matplotlib.lines import Line2D

output_eval = BayesianOptimizerOutputEvaluator(input, bayes_output)

x = np.arange(0, 100, 5)
y1 = 100 * ImpfTropCyclone.from_emanuel_usa().calc_mdr(x)

region_vhalf = ImpfSetTropCyclone.calibrated_regional_vhalf()['NA1']
y2 = 100 * ImpfTropCyclone.from_emanuel_usa(v_thresh=25.7, v_half=region_vhalf, scale=1).calc_mdr(x)

# Plot the impact function variability
plt = output_eval.plot_impf_variability(select_best(p_space_df, 0.03), plot_haz=False)
plt.plot(x, y1, color='darkgreen')
plt.plot(x, y2, color='lightgreen')
plt.legend(
    [
        Line2D([0], [0], color='blue', lw=2),
        Line2D([0], [0], color='lightblue', lw=2),
        Line2D([0], [0], color='darkgreen', lw=2),
        Line2D([0], [0], color='lightgreen', lw=2)],
    [
        'Calibrated here',
        'Uncertainty',
        'emanuel_usa default',
        'Eberenz calibrated NA1'
    ]
)

@peanutfun
Copy link
Member Author

@ChrisFairless Thanks for reporting these findings! To be sure it's due to the cost function, we have to select the exact same events from the EM-DAT database, use the exact same hazard footprints and exposures, and also calibrate the same parameters as Eberenz et al. The tutorial uses only a subset of the EM-DAT cases in the NA basin, and it calibrates two parameters of the function. So there are plenty of reasons why the results may differ from the previously calibrated functions, apart from the cost function.

@peanutfun peanutfun merged commit 9538bb5 into develop Jul 12, 2024
18 checks passed
@emanuel-schmid emanuel-schmid deleted the calibrate-impact-functions branch July 18, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants