Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add module for calibrating impact functions #692

Merged
merged 87 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
5d8278e
Initial draft for calibration from scipy.optimize
peanutfun Mar 31, 2023
443545e
Draft for impact function calibration
peanutfun May 4, 2023
819aab5
Add first unit tests of calibration module
peanutfun May 5, 2023
96e3cb3
ci: Add bayesian-optimization during Jenkins build
peanutfun May 5, 2023
123c632
Add __init__.py for util/calibarte/test module
peanutfun May 8, 2023
107a836
Add climada.util.calibrate.test module to test discovery
peanutfun May 8, 2023
2af6f09
Add unit and integration tests, update code base
peanutfun May 8, 2023
0d6e80b
Start documenting new calibrate module
peanutfun May 8, 2023
23cae6c
Actually add the intregration test
peanutfun May 8, 2023
50f3fd9
Add some documentation
peanutfun May 9, 2023
d321832
commit PLEASE CLEAN UP
peanutfun May 17, 2023
24c0fbc
Add more docstrings and simplify imports through __init__
peanutfun May 24, 2023
096a8d4
Add separate Output classes for each optimizer
peanutfun Jun 9, 2023
37c65d9
Merge branch 'develop' into calibrate-impact-functions
peanutfun Jun 9, 2023
e8abb1a
Restructure calibration module
peanutfun Jun 12, 2023
3d94151
Add tutorial on impact function calibration
peanutfun Jun 12, 2023
ea0eb47
Update tutorial
peanutfun Jun 13, 2023
0e5a557
Remove hazard event selection from calibrate.Input
peanutfun Jun 13, 2023
e1fe68a
Update calibration tutorial
peanutfun Jun 13, 2023
68c421b
Merge branch 'develop' into calibrate-impact-functions
emanuel-schmid Jun 26, 2023
df03b0d
Update climada/util/calibrate/bayesian_optimizer.py
peanutfun Jun 28, 2023
5ef4a01
Separate computing cost from transforming impact objects
peanutfun Jun 29, 2023
91cfd83
Merge branch 'calibrate-impact-functions' of https://github.com/CLIMA…
peanutfun Jul 10, 2023
4e1f104
Add evaluator for calibration output
peanutfun Jul 10, 2023
43f40b3
Add TestBayesianOptimizer test to test loader
peanutfun Jul 13, 2023
97d763a
Update code, docs, and tutorial
peanutfun Aug 3, 2023
d43eb8a
Update tutorial
peanutfun Aug 3, 2023
dda079d
Add option to adjust data frame alignment
peanutfun Aug 18, 2023
185866f
Merge branch 'develop' into calibrate-impact-functions
peanutfun Aug 21, 2023
5fdbf4e
Merge branch 'develop' into calibrate-impact-functions
peanutfun Sep 6, 2023
c2ede47
Merge branch 'develop' into calibrate-impact-functions
peanutfun Sep 20, 2023
c40b85a
Merge branch 'develop' into calibrate-impact-functions
peanutfun Oct 2, 2023
645862a
Merge branch 'develop' into calibrate-impact-functions
peanutfun Oct 26, 2023
357541a
Merge branch 'develop' into calibrate-impact-functions
emanuel-schmid Nov 9, 2023
2e536ef
add seaborn
emanuel-schmid Nov 9, 2023
2ace55f
Add function to plot Impf variability of calibration (#791)
timschmi95 Nov 14, 2023
423b5b8
Improve alignment and handling of NaNs
peanutfun Nov 15, 2023
8349016
Add seaborn to dependencies
peanutfun Nov 15, 2023
67ef797
Merge branch 'calibrate-impact-functions' of https://github.com/CLIMA…
peanutfun Nov 20, 2023
24c1fc3
Split tests into multiple files, finish up
peanutfun Nov 23, 2023
1538e78
Move impact transform and align to Input
peanutfun Nov 23, 2023
832da6a
Use MultiIndex in parameter space dataframe
peanutfun Nov 23, 2023
af959a7
Update tutorial
peanutfun Nov 23, 2023
042e6c9
Fix requirements for calibration module
peanutfun Nov 24, 2023
677fba9
Remove plot_impf_set function and improve exception type
peanutfun Nov 24, 2023
3a046c7
Add tests for OutputEvaluator
peanutfun Nov 24, 2023
066afe5
Fix name of bayes_opt package on PyPI
peanutfun Nov 24, 2023
fbc1701
Make sure latest seaborn is installed on Jenkins
peanutfun Nov 24, 2023
2986b51
Remove unused function definition
peanutfun Nov 24, 2023
8ab8600
Fix linter issues and remove unused code
peanutfun Nov 24, 2023
85a5826
Add BayesianOptimizerOutputEvaluator
peanutfun Nov 24, 2023
472d0c5
Fix typo in tutorial
peanutfun Nov 24, 2023
6ef09e5
Add GNU license header to new files
peanutfun Nov 24, 2023
661f991
Update CHANGELOG.md
peanutfun Nov 24, 2023
90f2749
edit authors.md
Nov 27, 2023
53feef6
Fix a bug in parameter space plot
peanutfun Nov 28, 2023
6fa9315
Merge branch 'develop' into calibrate-impact-functions
peanutfun Jan 30, 2024
d4d6777
Add suggestions from code review
peanutfun Jan 30, 2024
a0bada5
Add iteration controller for BayesianOptimizer
peanutfun Feb 20, 2024
cab68c4
Fix calibrate module init and add first controller tests
peanutfun Feb 20, 2024
a9c45d7
Merge branch 'develop' into calibrate-impact-functions
peanutfun Feb 20, 2024
9bf050c
Fix handling of instance maximum for constrained optimization
peanutfun Feb 21, 2024
17046db
Merge branch 'develop' into calibrate-impact-functions
peanutfun Feb 23, 2024
72f6263
Add tests for BayesianOptimizerController and fix verbosity
peanutfun Feb 23, 2024
5a8ac9f
Add test for plotting parameter space
peanutfun Feb 23, 2024
01f0f1c
Update integration tests for BayesianOptimizer
peanutfun Feb 23, 2024
b8f1cac
Add explanation of BayesianOptimizerController to tutorial
peanutfun Feb 23, 2024
10e9679
Merge branch 'develop' into calibrate-impact-functions
emanuel-schmid Mar 14, 2024
2326db2
Add option to store and load calibration results
peanutfun Apr 26, 2024
e270e1a
Merge branch 'develop' into calibrate-impact-functions
peanutfun Apr 29, 2024
5e2dd75
Update plots and tutorial
peanutfun Apr 29, 2024
10e0014
Add JOSS paper and associated GitHub workflow (#876)
peanutfun May 2, 2024
1c412da
JOSS Paper: Fix typo
peanutfun May 2, 2024
2b1ecda
JOSS Paper: Fix DOIs
peanutfun May 3, 2024
e7e441a
JOSS Paper: Do not call it 'natural disaster'.
peanutfun May 3, 2024
60b3a6e
JOSS: Add missing URL for Rougier et al.
peanutfun May 21, 2024
ba9bb7f
JOSS: Remove unused reference
peanutfun May 21, 2024
1d3d014
Add overview section to tutorial and include review suggestions
peanutfun Jul 8, 2024
9949f3e
Add quickstart section to tutorial
peanutfun Jul 8, 2024
a446edd
Shorten quickstart
peanutfun Jul 8, 2024
07adc6f
Guide readers through quickstart section
peanutfun Jul 8, 2024
1f14859
Replace Riedel et al. 2024 preprint with publication
peanutfun Jul 11, 2024
25fb7c3
Merge branch 'develop' into calibrate-impact-functions
peanutfun Jul 11, 2024
0183307
ci: Remove JOSS paper build job
peanutfun Jul 11, 2024
1511cc1
Fix CHANGELOG.md and add entry in Citation guide
peanutfun Jul 12, 2024
102a335
Merge branch 'develop' into calibrate-impact-functions
peanutfun Jul 12, 2024
2232c8c
Revert changes to Jenkinsfile
peanutfun Jul 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Move impact transform and align to Input
  • Loading branch information
peanutfun committed Nov 23, 2023
commit 1538e78194af439df3350db37d81058500125f36
3 changes: 1 addition & 2 deletions climada/test/test_util_calibrate.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

from climada.util.calibrate import Input, ScipyMinimizeOptimizer, BayesianOptimizer

from climada.util.calibrate.test.test_calibrate import hazard, exposure
from climada.util.calibrate.test.test_base import hazard, exposure


class TestScipyMinimizeOptimizer(unittest.TestCase):
Expand Down Expand Up @@ -48,7 +48,6 @@ def setUp(self) -> None:
self.impact_func_creator,
self.impact_to_dataframe,
mean_squared_error,
# lambda x,y: mean_squared_error(x, y, squared=True),
)

def test_single(self):
Expand Down
243 changes: 135 additions & 108 deletions climada/util/calibrate/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,59 @@ def __post_init__(self, assign_centroids):
if assign_centroids:
self.exposure.assign_centroids(self.hazard)

def impact_to_aligned_df(
self, impact: Impact, fillna: float = np.nan
) -> Tuple[pd.DataFrame, pd.DataFrame]:
"""Create a dataframe from an impact and align it with the data.

When aligning, two general cases might occur, which are not mutually exclusive:

1. There are data points for which no impact was computed. This will always be
treated as an impact of zero.
2. There are impacts for which no data points exist. For these points, the input
data will be filled with the value of :py:attr:`Input.missing_data_value`.

This method performs the following steps:

* Transform the impact into a dataframe using :py:attr:`impact_to_dataframe`.
* Align the :py:attr:`data` with the impact dataframe, using
:py:attr:`missing_data_value` as fill value.
* Align the impact dataframe with the data, using zeros as fill value.
* In the aligned impact, set all values to zero where the data is NaN.
* Fill remaining NaNs in data with ``fillna``.

Parameters
----------
impact_df : pandas.DataFrame
The impact computed by the model, transformed into a dataframe by
:py:attr:`Input.impact_to_dataframe`.

Returns
-------
data_aligned : pd.DataFrame
The data aligned to the impact dataframe
impact_df_aligned : pd.DataFrame
The impact transformed to a dataframe and aligned with the data
"""
# Transform impact to to dataframe
impact_df = self.impact_to_dataframe(impact)
if impact_df.isna().any(axis=None):
raise ValueError("NaN values computed in impact!")

# Align with different fill values
data_aligned, _ = self.data.align(
impact_df, axis=None, fill_value=self.missing_data_value, copy=True
)
impact_df_aligned, _ = impact_df.align(
data_aligned, join="right", axis=None, fill_value=0.0, copy=False
)

# Set all impacts to zero for which data is NaN
impact_df_aligned.where(data_aligned.notna(), 0.0, inplace=True)

# NOTE: impact_df_aligned should not contain any NaNs at this point
return data_aligned.fillna(fillna), impact_df_aligned.fillna(fillna)


@dataclass
class Output:
Expand Down Expand Up @@ -163,7 +216,6 @@ def plot_impf_variability(
plot_impf_kws: Optional[dict] = None,
plot_hist_kws: Optional[dict] = None,
):

"""Plot impact function variability with parameter combinations of
almost equal cost function values

Expand All @@ -190,7 +242,7 @@ def plot_impf_variability(
if p_space_df is None:
# Assert that self.output has the p_space_to_dataframe() method,
# which is defined for the BayesianOptimizerOutput class
if not hasattr(self.output,"p_space_to_dataframe"):
if not hasattr(self.output, "p_space_to_dataframe"):
raise TypeError(
"To derive the full impact function parameter space, "
"plot_impf_variability() requires BayesianOptimizerOutput "
Expand All @@ -203,74 +255,93 @@ def plot_impf_variability(
# and remove the dimension 'Cost Function'.
params = p_space_df.columns.tolist()
try:
params.remove('Cost Function')
params.remove("Cost Function")
except ValueError:
pass

# Retrieve parameters of impact functions with cost function values
# within 'cost_func_diff' % of the best estimate
params_within_range = p_space_df[params]
plot_space_label = 'Parameter space'
plot_space_label = "Parameter space"
if cost_func_diff is not None:
max_cost_func_val = (p_space_df['Cost Function'].min()*
(1+cost_func_diff))
max_cost_func_val = p_space_df["Cost Function"].min() * (1 + cost_func_diff)
params_within_range = p_space_df.loc[
p_space_df['Cost Function'] <=max_cost_func_val,params
p_space_df["Cost Function"] <= max_cost_func_val, params
]
plot_space_label = (f"within {int(cost_func_diff*100)} percent "
f"of best fit")
plot_space_label = (
f"within {int(cost_func_diff*100)} percent " f"of best fit"
)

# Set plot defaults
color = plot_impf_kws.pop('color','tab:blue')
lw = plot_impf_kws.pop('lw',2)
zorder = plot_impf_kws.pop('zorder',3)
label = plot_impf_kws.pop('label','best fit')
color = plot_impf_kws.pop("color", "tab:blue")
lw = plot_impf_kws.pop("lw", 2)
zorder = plot_impf_kws.pop("zorder", 3)
label = plot_impf_kws.pop("label", "best fit")

#get number of impact functions and create a plot for each
# get number of impact functions and create a plot for each
n_impf = len(self.impf_set.get_func(haz_type=haz_type))
axes=[]
axes = []

for impf_idx in range(n_impf):
_, ax = plt.subplots()

_,ax = plt.subplots()

#Plot best-fit impact function
# Plot best-fit impact function
best_impf = self.impf_set.get_func(haz_type=haz_type)[impf_idx]
ax.plot(best_impf.intensity,best_impf.mdd*best_impf.paa*100,
color=color,lw=lw,zorder=zorder,label=label,**plot_impf_kws)

#Plot all impact functions within 'cost_func_diff' % of best estimate
ax.plot(
best_impf.intensity,
best_impf.mdd * best_impf.paa * 100,
color=color,
lw=lw,
zorder=zorder,
label=label,
**plot_impf_kws,
)

# Plot all impact functions within 'cost_func_diff' % of best estimate
for row in range(params_within_range.shape[0]):
label_temp = plot_space_label if row == 0 else None

sel_params = params_within_range.iloc[row,:].to_dict()
sel_params = params_within_range.iloc[row, :].to_dict()
temp_impf_set = self.input.impact_func_creator(**sel_params)
temp_impf = temp_impf_set.get_func(haz_type=haz_type)[impf_idx]

ax.plot(temp_impf.intensity,temp_impf.mdd*temp_impf.paa*100,
color='grey',alpha=0.4,label=label_temp)
ax.plot(
temp_impf.intensity,
temp_impf.mdd * temp_impf.paa * 100,
color="grey",
alpha=0.4,
label=label_temp,
)

# Plot hazard intensity value distributions
if plot_haz:
haz_vals = self.input.hazard.intensity[
:, self.input.exposure.gdf[f"centr_{haz_type}"]
]

#Plot defaults
color_hist = plot_hist_kws.pop('color','tab:orange')
alpha_hist = plot_hist_kws.pop('alpha',0.3)
# Plot defaults
color_hist = plot_hist_kws.pop("color", "tab:orange")
alpha_hist = plot_hist_kws.pop("alpha", 0.3)

ax2 = ax.twinx()
ax2.hist(haz_vals.data,bins=40,color=color_hist,
alpha=alpha_hist,label='Hazard intensity\noccurence')
ax2.set(ylabel='Hazard intensity occurence (#Exposure points)')
ax.axvline(x=haz_vals.max(),label='Maximum hazard value',
color='tab:orange')
ax2.legend(loc='lower right')

ax.set(xlabel=f"Intensity ({self.input.hazard.units})",
ax2.hist(
haz_vals.data,
bins=40,
color=color_hist,
alpha=alpha_hist,
label="Hazard intensity\noccurence",
)
ax2.set(ylabel="Hazard intensity occurence (#Exposure points)")
ax.axvline(
x=haz_vals.max(), label="Maximum hazard value", color="tab:orange"
)
ax2.legend(loc="lower right")

ax.set(
xlabel=f"Intensity ({self.input.hazard.units})",
ylabel="Mean Damage Ratio (MDR) in %",
xlim=(min(best_impf.intensity),max(best_impf.intensity)))
xlim=(min(best_impf.intensity), max(best_impf.intensity)),
)
ax.legend()
axes.append(ax)

Expand All @@ -279,13 +350,12 @@ def plot_impf_variability(

return ax


def plot_at_event(
self,
data_transf: Callable[[pd.DataFrame], pd.DataFrame] = lambda x: x,
**plot_kwargs,
):
"""Create a bar plot comparing estimated model output and data per event
"""Create a bar plot comparing estimated model output and data per event.

Every row of the :py:attr:`Input.data` is considered an event.
The data to be plotted can be transformed with a generic function
Expand All @@ -305,21 +375,23 @@ def plot_at_event(
-------
ax : matplotlib.axes.Axes
The plot axis returned by ``DataFrame.plot.bar``

Note
----
This plot does *not* include the ignored impact, see :py:attr:`Input.data`.
"""
data = pd.concat(
[
self.input.impact_to_dataframe(self.impact).sum(axis="columns"),
self.input.data.sum(axis="columns"),
],
data, impact = self.input.impact_to_aligned_df(self.impact)
values = pd.concat(
[impact.sum(axis="columns"), data.sum(axis="columns")],
axis=1,
).rename(columns={0: "Model", 1: "Data"})

# Transform data before plotting
data = data_transf(data)
values = data_transf(values)

# Now plot
ylabel = plot_kwargs.pop("ylabel", self._impact_label)
return data.plot.bar(ylabel=ylabel, **plot_kwargs)
return values.plot.bar(ylabel=ylabel, **plot_kwargs)

def plot_at_region(
self,
Expand All @@ -346,21 +418,23 @@ def plot_at_region(
-------
ax : matplotlib.axes.Axes
The plot axis returned by ``DataFrame.plot.bar``.

Note
----
This plot does *not* include the ignored impact, see :py:attr:`Input.data`.
"""
data = pd.concat(
[
self.input.impact_to_dataframe(self.impact).sum(axis="index"),
self.input.data.sum(axis="index"),
],
data, impact = self.input.impact_to_aligned_df(self.impact)
values = pd.concat(
[impact.sum(axis="index"), data.sum(axis="index")],
axis=1,
).rename(columns={0: "Model", 1: "Data"})

# Transform data before plotting
data = data_transf(data)
values = data_transf(values)

# Now plot
ylabel = plot_kwargs.pop("ylabel", self._impact_label)
return data.plot.bar(ylabel=ylabel, **plot_kwargs)
return values.plot.bar(ylabel=ylabel, **plot_kwargs)

def plot_event_region_heatmap(
self,
Expand Down Expand Up @@ -391,13 +465,12 @@ def plot_event_region_heatmap(

"""
# Data preparation
agg = self.input.impact_to_dataframe(self.impact)
data = (agg + 1) / (self.input.data + 1)
data = data.transform(np.log10)
data = data.where((agg > 0) | (self.input.data > 0))
data, impact = self.input.impact_to_aligned_df(self.impact)
values = (impact + 1) / (data + 1) # Avoid division by zero
values = values.transform(np.log10)

# Transform data
data = data_transf(data)
values = data_transf(values)

# Default plot settings
annot = plot_kwargs.pop("annot", True)
Expand All @@ -411,7 +484,7 @@ def plot_event_region_heatmap(
)

return sns.heatmap(
data,
values,
annot=annot,
vmin=vmin,
vmax=vmax,
Expand Down Expand Up @@ -482,53 +555,6 @@ def _kwargs_to_impact_func_creator(self, *_, **kwargs) -> Dict[str, Any]:
"""
return kwargs

def _align_impact_with_data(
self, impact_df: pd.DataFrame
) -> Tuple[pd.DataFrame, pd.DataFrame]:
"""Align the impact dataframe with the input data dataframe

When aligning, two general cases might occur, which are not mutually exclusive:

1. There are data points for which no impact was computed. This will always be
treated as an impact of zero.
2. There are impacts for which no data points exist. For these points, the input
data will be filled with the value of :py:attr:`Input.missing_data_value`.

Parameters
----------
impact_df : pandas.DataFrame
The impact computed by the model, transformed into a dataframe by
:py:attr:`Input.impact_to_dataframe`.

Returns
-------
data_aligned : pandas.DataFrame
The :py:attr:`Input.data` aligned with the impact.
impact_df_aligned : pandas.DataFrame
The ``impact_df`` aligned with the data.

Raises
------
ValueError
If ``impact_df`` contains NaNs before aligning.
"""
if impact_df.isna().any(axis=None):
raise ValueError("NaN values computed in impact!")

data_aligned, impact_df_aligned = self.input.data.align(
impact_df, axis=None, fill_value=None
)

# Add user-set value for non-aligned data
data_aligned[
impact_df_aligned.notna() & data_aligned.isna()
] = self.input.missing_data_value

# Set all impacts to zero for which data is NaN
impact_df_aligned.where(data_aligned.notna(), inplace=True)

return data_aligned.fillna(0), impact_df_aligned.fillna(0)

def _opt_func(self, *args, **kwargs) -> Number:
"""The optimization function iterated by the optimizer

Expand Down Expand Up @@ -557,8 +583,9 @@ def _opt_func(self, *args, **kwargs) -> Number:
).impact(**self.input.impact_calc_kwds)

# Transform to DataFrame, align, and compute target function
impact_df = self.input.impact_to_dataframe(impact)
data_aligned, impact_df_aligned = self._align_impact_with_data(impact_df)
data_aligned, impact_df_aligned = self.input.impact_to_aligned_df(
impact, fillna=0
)
return self._target_func(data_aligned, impact_df_aligned)

@abstractmethod
Expand Down
Loading