Skip to content

Commit dc5e735

Browse files
authored
Merge pull request #18 from CH-Earth/paper-revisions
Edits for FROSTBYTE paper revisions
2 parents 1cff260 + e02decd commit dc5e735

File tree

10 files changed

+47
-23
lines changed

10 files changed

+47
-23
lines changed

CITATION.cff

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,6 @@ authors:
1414
given-names: "A. N."
1515
orcid: "https://orcid.org/0000-0001-6583-0038"
1616
title: "FROSTBYTE: Forecasting River Outlooks from Snow Timeseries: Building Yearly Targeted Ensembles"
17-
version: 0.9.0
18-
date-released: 2023-12-11
17+
version: 1.0.0
18+
date-released: 2023-06-15
1919
url: "https://github.com/CH-Earth/FROSTBYTE"

README.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ FROSTBYTE is a reproducible data-driven workflow for probabilistic seasonal stre
1414

1515
## Description
1616

17-
This repository contains a reproducible data-driven workflow, organized as a collection of Jupyter Notebooks. The workflow leverages snow water equivalent (SWE) measurements as predictors and streamflow observations as predictands, drawn from reliable datasets like CanSWE, NRCS, SNOTEL, HYDAT, and USGS. Gap filling for SWE datasets is done using quantile mapping from nearby stations and Principal Component Analysis is used to identify independent predictor components. These components are employed in a regression model to generate ensemble hindcasts of seasonal streamflow volumes. This workflow was applied by Arnal et al. (manuscript in preparation for submission to HESS) to 75 river basins with a nival (i.e., snowmelt-driven) regime and with minimal regulation across Canada and the USA, for generating hindcasts from 1979 to 2021. This study presented a user-oriented hindcast evaluation, offering valuable insights for snow surveyors, forecasters, workflow developers, and decision-makers.
17+
This repository contains a reproducible data-driven workflow, organized as a collection of Jupyter Notebooks. The workflow leverages snow water equivalent (SWE) measurements as predictors and streamflow observations as predictands, drawn from reliable datasets like CanSWE, NRCS, SNOTEL, HYDAT, and USGS. Gap filling for SWE datasets is done using quantile mapping from nearby stations and Principal Component Analysis is used to identify independent predictor components. These components are employed in a regression model to generate ensemble hindcasts of seasonal streamflow volumes. This workflow was applied by Arnal et al. (2024) to 75 river basins with a nival (i.e., snowmelt-driven) regime and with minimal regulation across Canada and the USA, for generating hindcasts from 1979 to 2021. This study presented a user-oriented hindcast evaluation, offering valuable insights for snow surveyors, forecasters, workflow developers, and decision-makers.
1818

1919
## Repository Structure
2020

@@ -31,7 +31,7 @@ The steps below will help you to have a fully set-up environment to explore and
3131

3232
Begin by cloning the repository to your local machine. Use the command below in your terminal or command prompt:
3333
```bash
34-
git clone https://github.com/lou-a/FROSTBYTE.git
34+
git clone https://github.com/CH-Earth/FROSTBYTE.git
3535
```
3636
This command will create a copy of the repository in your current directory.
3737
2. **Set Up Virtual Environment (Optional)**
@@ -96,6 +96,8 @@ This project is licensed under the MIT License. See the [LICENSE](LICENSE.md) fi
9696

9797
If you use this workflow, please consider citing it using the `Cite this repository` button.
9898

99+
Arnal, L., Clark, M. P., Pietroniro, A., Vionnet, V., Casson, D. R., Whitfield, P. H., Fortin, V., Wood, A. W., Knoben, W. J. M., Newton, B. W., and Walford, C.: FROSTBYTE: A reproducible data-driven workflow for probabilistic seasonal streamflow forecasting in snow-fed river basins across North America, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2023-3040, 2024.
100+
99101
## Contact
100102

101103
If you have any questions about using or running the workflow, or are willing to contribute, please contact louise.arnal[-at-]usask.ca

notebooks/5_HindcastVerification.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -4566,7 +4566,7 @@
45664566
"source": [
45674567
"# Calculate probabilistic verification metrics with bootstrapping (flag=1)\n",
45684568
"# Note: this takes a little while to run for all the bootstrapping iterations\n",
4569-
"crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da = prob_metrics_calculation(Qobs=obs_ds, Qfc_ens=ens_fc_ds, flag=1, niterations=niterations_default, perc_event_low=perc_event_low_default, perc_event_high=perc_event_high_default, min_obs=min_obs_default, bins_thresholds=bins_thresholds_default)"
4569+
"crpss_bs_da, fair_crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da = prob_metrics_calculation(Qobs=obs_ds, Qfc_ens=ens_fc_ds, flag=1, niterations=niterations_default, perc_event_low=perc_event_low_default, perc_event_high=perc_event_high_default, min_obs=min_obs_default, bins_thresholds=bins_thresholds_default)"
45704570
]
45714571
},
45724572
{
@@ -9195,7 +9195,7 @@
91959195
],
91969196
"source": [
91979197
"# Save into a single xarray DataFrame\n",
9198-
"prob_verif_metrics_bs_basin_ds = xr.merge([crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da])\n",
9198+
"prob_verif_metrics_bs_basin_ds = xr.merge([crpss_bs_da, fair_crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da])\n",
91999199
"prob_verif_metrics_bs_basin_ds.attrs['info'] = 'Various probabilistic verification metrics calculated for basin '+test_basin_id+'.'\n",
92009200
"\n",
92019201
"display(prob_verif_metrics_bs_basin_ds)"

notebooks/NotebookMethods.png

2.79 KB
Loading

notebooks/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
To explore FROSTBYTE, the best way is to navigate the Jupyter Notebooks in this section! The image below shows the methods implemented in each notebook. Following that is a brief text description, but open the notebooks themselves to see all steps your yourself.
44

5-
For installation instructions, refer back to the [landing page](https://github.com/lou-a/FROSTBYTE). Test data has been included for a sample catchment in Canada and in the USA.
5+
For installation instructions, refer back to the [landing page](https://github.com/CH-Earth/FROSTBYTE). Test data has been included for a sample catchment in Canada and in the USA.
66

77

88
<p align="center">

requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
Bottleneck==1.3.2
2+
CRPS==2.0.4
23
geopandas==0.10.2
34
ipykernel==5.1.4
45
matplotlib==3.1.3

scripts/functions.py

+34-13
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
############################################################################################################
55

66
# Import required modules
7+
import CRPS.CRPS as CRPSscore
78
import datetime
89
from datetime import timedelta, date
910
import geopandas as gpd
@@ -318,7 +319,7 @@ def continuous_rank_prob_score(Qobs, Qfc_ens, min_obs):
318319
CRPS range: 0 to +Inf. Perfect score: 0. Units: Same as variable measured.
319320
CRPSS range: -Inf to 1. Perfect score: 1. Units: Unitless.
320321
Characteristics: It is equivalent to the mean absolute error (MAE) for deterministic forecasts.
321-
For more info, see the Python CRPS package documentation: https://pypi.org/project/properscoring/
322+
For more info, see the relevant Python CRPS package documentation: https://pypi.org/project/properscoring/, https://pypi.org/project/CRPS/
322323
323324
Keyword arguments:
324325
------------------
@@ -328,8 +329,8 @@ def continuous_rank_prob_score(Qobs, Qfc_ens, min_obs):
328329
329330
Returns:
330331
--------
331-
- CRPS: Float of the CRPS value between the ensemble forecasts & observations.
332332
- CRPSS: Float of the CRPSS value between the ensemble forecasts & observations.
333+
- fairCRPSS: Float of the fairCRPSS value between the ensemble forecasts & observations.
333334
334335
"""
335336

@@ -344,11 +345,23 @@ def continuous_rank_prob_score(Qobs, Qfc_ens, min_obs):
344345
CRPS_baseline = ps.crps_ensemble(Qobs, baseline).mean()
345346
CRPSS = 1 - CRPS / CRPS_baseline
346347

348+
# Calculate the fairCRPS and fairCRPSS
349+
fairCRPS = []
350+
fairCRPS_baseline = []
351+
for y in range(len(Qobs)):
352+
fcrps = CRPSscore(Qfc_ens[y,:].values, Qobs[y].values).compute()[1]
353+
fcrps_baseline = CRPSscore(baseline[y,:][~np.isnan(baseline[y,:])].tolist(), Qobs[y].values).compute()[1]
354+
fairCRPS.append(fcrps)
355+
fairCRPS_baseline.append(fcrps_baseline)
356+
fairCRPS = np.mean(fcrps)
357+
fairCRPS_baseline = np.mean(fairCRPS_baseline)
358+
fairCRPSS = 1 - fairCRPS / fairCRPS_baseline
359+
347360
else:
348361

349-
CRPS, CRPSS = np.nan, np.nan
362+
CRPSS, fairCRPSS = np.nan, np.nan
350363

351-
return CRPS, CRPSS
364+
return CRPSS, fairCRPSS
352365

353366
###
354367

@@ -1282,7 +1295,7 @@ def principal_component_analysis(stations_data, flag):
12821295

12831296
def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, perc_event_high, min_obs, bins_thresholds):
12841297

1285-
"""Calculates deterministic metrics for whole hindcast timeseries (1 value per hindcast start date & target period).
1298+
"""Calculates probabilistic metrics for whole hindcast timeseries (1 value per hindcast start date & target period).
12861299
12871300
Keyword arguments:
12881301
------------------
@@ -1297,8 +1310,8 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
12971310
12981311
Returns:
12991312
--------
1300-
- crps_da: xarray DataArray containing the CRPS for each hindcast start date & target period.
13011313
- crpss_da: xarray DataArray containing the CRPSS for each hindcast start date & target period.
1314+
- fair_crpss_da: xarray DataArray containing the fairCRPSS for each hindcast start date & target period.
13021315
- reli_da: xarray DataArray containing the reliability index for each hindcast start date & target period.
13031316
- roc_auc_da: xarray DataArray containing the ROC area under the curve for each hindcast start date & target period.
13041317
- roc_da: xarray DataArray containing the ROC curves for each hindcast start date & target period.
@@ -1312,11 +1325,13 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
13121325
# Initialize the verification metrics' Numpy arrays
13131326
if flag == 0:
13141327
crpss_array = np.ones((len(initdates),len(targetperiods))) * np.nan
1328+
fair_crpss_array = np.ones((len(initdates),len(targetperiods))) * np.nan
13151329
reli_array = np.ones((len(initdates),len(targetperiods))) * np.nan
13161330
roc_auc_array = np.ones((len(initdates),len(targetperiods),2)) * np.nan
13171331
roc_array = np.ones((len(initdates),len(targetperiods),2,11,2)) * np.nan
13181332
elif flag == 1:
13191333
crpss_array = np.ones((len(initdates),len(targetperiods),niterations)) * np.nan
1334+
fair_crpss_array = np.ones((len(initdates),len(targetperiods),niterations)) * np.nan
13201335
reli_array = np.ones((len(initdates),len(targetperiods),niterations)) * np.nan
13211336
roc_auc_array = np.ones((len(initdates),len(targetperiods),niterations,2)) * np.nan
13221337
roc_array = np.ones((len(initdates),len(targetperiods),niterations,2,11,2)) * np.nan
@@ -1351,7 +1366,8 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
13511366
if flag == 0:
13521367
# CRPS & CRPSS
13531368
crps_outputs = continuous_rank_prob_score(Qobs_data, Qfc_ens_data, min_obs)
1354-
crpss_array[row,column] = round(crps_outputs[1],2)
1369+
crpss_array[row,column] = round(crps_outputs[0],2)
1370+
fair_crpss_array[row,column] = round(crps_outputs[1],2)
13551371
# Reliability index
13561372
reli_array[row,column] = round(reli_index(Qobs_data, Qfc_ens_data, min_obs),2)
13571373
# ROC
@@ -1373,7 +1389,8 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
13731389

13741390
# CRPS & CRPSS
13751391
crps_outputs = continuous_rank_prob_score(Qobs_data_bs, Qfc_ens_data_bs, min_obs)
1376-
crpss_array[row,column,b] = round(crps_outputs[1],2)
1392+
crpss_array[row,column,b] = round(crps_outputs[0],2)
1393+
fair_crpss_array[row,column,b] = round(crps_outputs[1],2)
13771394
# Reliability index
13781395
reli_array[row,column,b] = round(reli_index(Qobs_data_bs, Qfc_ens_data_bs, min_obs),2)
13791396
# ROC
@@ -1386,25 +1403,29 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
13861403

13871404
# Save values to xarray DataArrays
13881405
if flag == 0:
1389-
crpss_da = xr.DataArray(data=crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_month','target_period'], name='CRPSS')
1390-
reli_da = xr.DataArray(data=reli_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_month','target_period'], name='Reliability_index')
1406+
crpss_da = xr.DataArray(data=crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_date','target_period'], name='CRPSS')
1407+
fair_crpss_da = xr.DataArray(data=fair_crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_date','target_period'], name='fairCRPSS')
1408+
reli_da = xr.DataArray(data=reli_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_date','target_period'], name='Reliability_index')
13911409
roc_auc_da = xr.DataArray(data=roc_auc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','event'], name='ROC_AUC')
1392-
roc_da = xr.DataArray(data=roc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'rate':['FAR','HR'],'bins':roc_outputs_high[0].bins,'event':[perc_event_low, perc_event_high]}, dims=['init_month','target_period','rate','bins','event'], name='ROC')
1410+
roc_da = xr.DataArray(data=roc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'rate':['FAR','HR'],'bins':roc_outputs_high[0].bins,'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','rate','bins','event'], name='ROC')
13931411

13941412
elif flag == 1:
13951413
crpss_da = xr.DataArray(data=crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1)}, dims=['init_date','target_period','iteration'], name='CRPSS')
1414+
fair_crpss_da = xr.DataArray(data=fair_crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1)}, dims=['init_date','target_period','iteration'], name='fairCRPSS')
13961415
reli_da = xr.DataArray(data=reli_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1)}, dims=['init_date','target_period','iteration'], name='Reliability_index')
13971416
roc_auc_da = xr.DataArray(data=roc_auc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1),'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','iteration','event'], name='ROC_AUC')
13981417
roc_da = xr.DataArray(data=roc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1),'rate':['FAR','HR'],'bins':roc_outputs_high[0].bins,'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','iteration','rate','bins','event'], name='ROC')
13991418

14001419

14011420
# Information for the output xarray DataArrays
1402-
da_dict = {'CRPSS':crpss_da,'reli':reli_da,'ROC_AUC':roc_auc_da,'ROC':roc_da}
1421+
da_dict = {'CRPSS':crpss_da,'fairCRPSS':fair_crpss_da,'reli':reli_da,'ROC_AUC':roc_auc_da,'ROC':roc_da}
14031422
metrics_longnames_dict = {'CRPSS':'Continuous Rank Probability Skill Score',
1423+
'fairCRPSS':'Fair Continuous Rank Probability Skill Score',
14041424
'reli':'Reliability index',
14051425
'ROC_AUC':'Relative Operating Characteristic (ROC) area under the curve (AUC)',
14061426
'ROC':'Relative Operating Characteristic (ROC)'}
14071427
metrics_info_dict = {'CRPSS':'Measures the skill of the hindcast against a baseline (observations climatology). Range: -Inf to 1. Perfect score: 1. Units: Unitless.',
1428+
'fairCRPSS':'Measures the skill of the hindcast against a baseline (observations climatology), using a fair method to account for differences in ensemble sizes. Range: -Inf to 1. Perfect score: 1. Units: Unitless.',
14081429
'reli':'Measures the closeness between the empirical CDF of the ensemble hindcast with the CDF of a uniform distribution (i.e., flat rank histogram). Range: 0 to 1. Perfect score: 1. Units: Unitless.',
14091430
'ROC_AUC':'Measures the ensemble hindcast resolution, its ability to discriminate between events (given percentile) & non-events. ROC AUC range: 0 to 1,. Perfect score: 1. No skill: 0.5. Units: Unitless.',
14101431
'ROC':'Measures the ensemble hindcast resolution, its ability to discriminate between events (given percentile) & non-events. The ROC curve plots the hit rate (HR) vs the false alarm rate (FAR) using a set of increasing probability thresholds (i.e., 0.1, 0.2, ..., 1) to make the yes/no decision.'}
@@ -1430,7 +1451,7 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
14301451
da_dict[keys].bins.attrs['info'] = 'Forecast probability thresholds used for the ROC calculations.'
14311452
da_dict[keys].rate.attrs['info'] = 'The false alarm rate (FAR) captures when an event is forecast to occur, but did not occur. The hite rate (HR) captures when an event is forecast to occur, and did occur.'
14321453

1433-
return crpss_da, reli_da, roc_auc_da, roc_da
1454+
return crpss_da, fair_crpss_da, reli_da, roc_auc_da, roc_da
14341455

14351456
###
14361457

settings/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ The settings for running the data-driven forecasting workflow are located in thi
44

55
## Instructions
66

7-
Copy an existing settings file, and update to match the directory paths for your own environment, for the basin of interest, input data and output paths.
7+
Copy an existing settings file, and update to match the directory paths for your own environment, for the basin of interest, input data and output paths. Data are provided for two river basins in the `test_case_data` folder, corresponding to the Bow River at Banff in Alberta, Canada (05BB001), and the Crystal River Above Avalanche Creek, Near Redstone in Colorado, USA (09081600).

settings/config_test_case.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Configuration file for data-driven forecasting
22
# Set required data paths - these are relative paths to where this script is stored
33

4-
# Domain, note that the name needs to match the data paths that follow
4+
# Domain, note that the name needs to match the data paths that follow - current options are: "05BB001" for the Bow River at Banff in Alberta, Canada, or "09081600" for the Crystal River Above Avalanche Creek, Near Redstone in Colorado, USA
55
domain: "05BB001"
66

77
# Observational data path

test_case_data/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Test Case Data
22

3-
Sample data for running the forecasting workflow for two single river basins: the Bow River at Banff in Alberta, Canada, and the Crystal River Abv Avalanche Crk, Near Redstone in Colorado, USA. The locations of both are shown in the image below.
3+
Sample data for running the forecasting workflow for two single river basins: the Bow River at Banff in Alberta, Canada (05BB001), and the Crystal River Above Avalanche Creek, Near Redstone in Colorado, USA (09081600). The locations of both are shown in the image below.
44

55
<p align="center">
66
<img src="TestBasins.png" alt="Test basins" width="500"/>

0 commit comments

Comments
 (0)