Skip to content

Commit

Permalink
Merge pull request #18 from CH-Earth/paper-revisions
Browse files Browse the repository at this point in the history
Edits for FROSTBYTE paper revisions
  • Loading branch information
DaveCasson authored Jun 16, 2024
2 parents 1cff260 + e02decd commit dc5e735
Show file tree
Hide file tree
Showing 10 changed files with 47 additions and 23 deletions.
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ authors:
given-names: "A. N."
orcid: "https://orcid.org/0000-0001-6583-0038"
title: "FROSTBYTE: Forecasting River Outlooks from Snow Timeseries: Building Yearly Targeted Ensembles"
version: 0.9.0
date-released: 2023-12-11
version: 1.0.0
date-released: 2023-06-15
url: "https://github.com/CH-Earth/FROSTBYTE"
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ FROSTBYTE is a reproducible data-driven workflow for probabilistic seasonal stre

## Description

This repository contains a reproducible data-driven workflow, organized as a collection of Jupyter Notebooks. The workflow leverages snow water equivalent (SWE) measurements as predictors and streamflow observations as predictands, drawn from reliable datasets like CanSWE, NRCS, SNOTEL, HYDAT, and USGS. Gap filling for SWE datasets is done using quantile mapping from nearby stations and Principal Component Analysis is used to identify independent predictor components. These components are employed in a regression model to generate ensemble hindcasts of seasonal streamflow volumes. This workflow was applied by Arnal et al. (manuscript in preparation for submission to HESS) to 75 river basins with a nival (i.e., snowmelt-driven) regime and with minimal regulation across Canada and the USA, for generating hindcasts from 1979 to 2021. This study presented a user-oriented hindcast evaluation, offering valuable insights for snow surveyors, forecasters, workflow developers, and decision-makers.
This repository contains a reproducible data-driven workflow, organized as a collection of Jupyter Notebooks. The workflow leverages snow water equivalent (SWE) measurements as predictors and streamflow observations as predictands, drawn from reliable datasets like CanSWE, NRCS, SNOTEL, HYDAT, and USGS. Gap filling for SWE datasets is done using quantile mapping from nearby stations and Principal Component Analysis is used to identify independent predictor components. These components are employed in a regression model to generate ensemble hindcasts of seasonal streamflow volumes. This workflow was applied by Arnal et al. (2024) to 75 river basins with a nival (i.e., snowmelt-driven) regime and with minimal regulation across Canada and the USA, for generating hindcasts from 1979 to 2021. This study presented a user-oriented hindcast evaluation, offering valuable insights for snow surveyors, forecasters, workflow developers, and decision-makers.

## Repository Structure

Expand All @@ -31,7 +31,7 @@ The steps below will help you to have a fully set-up environment to explore and

Begin by cloning the repository to your local machine. Use the command below in your terminal or command prompt:
```bash
git clone https://github.com/lou-a/FROSTBYTE.git
git clone https://github.com/CH-Earth/FROSTBYTE.git
```
This command will create a copy of the repository in your current directory.
2. **Set Up Virtual Environment (Optional)**
Expand Down Expand Up @@ -96,6 +96,8 @@ This project is licensed under the MIT License. See the [LICENSE](LICENSE.md) fi
If you use this workflow, please consider citing it using the `Cite this repository` button.
Arnal, L., Clark, M. P., Pietroniro, A., Vionnet, V., Casson, D. R., Whitfield, P. H., Fortin, V., Wood, A. W., Knoben, W. J. M., Newton, B. W., and Walford, C.: FROSTBYTE: A reproducible data-driven workflow for probabilistic seasonal streamflow forecasting in snow-fed river basins across North America, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2023-3040, 2024.
## Contact
If you have any questions about using or running the workflow, or are willing to contribute, please contact louise.arnal[-at-]usask.ca
4 changes: 2 additions & 2 deletions notebooks/5_HindcastVerification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4566,7 +4566,7 @@
"source": [
"# Calculate probabilistic verification metrics with bootstrapping (flag=1)\n",
"# Note: this takes a little while to run for all the bootstrapping iterations\n",
"crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da = prob_metrics_calculation(Qobs=obs_ds, Qfc_ens=ens_fc_ds, flag=1, niterations=niterations_default, perc_event_low=perc_event_low_default, perc_event_high=perc_event_high_default, min_obs=min_obs_default, bins_thresholds=bins_thresholds_default)"
"crpss_bs_da, fair_crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da = prob_metrics_calculation(Qobs=obs_ds, Qfc_ens=ens_fc_ds, flag=1, niterations=niterations_default, perc_event_low=perc_event_low_default, perc_event_high=perc_event_high_default, min_obs=min_obs_default, bins_thresholds=bins_thresholds_default)"
]
},
{
Expand Down Expand Up @@ -9195,7 +9195,7 @@
],
"source": [
"# Save into a single xarray DataFrame\n",
"prob_verif_metrics_bs_basin_ds = xr.merge([crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da])\n",
"prob_verif_metrics_bs_basin_ds = xr.merge([crpss_bs_da, fair_crpss_bs_da, reli_bs_da, roc_auc_bs_da, roc_bs_da])\n",
"prob_verif_metrics_bs_basin_ds.attrs['info'] = 'Various probabilistic verification metrics calculated for basin '+test_basin_id+'.'\n",
"\n",
"display(prob_verif_metrics_bs_basin_ds)"
Expand Down
Binary file modified notebooks/NotebookMethods.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

To explore FROSTBYTE, the best way is to navigate the Jupyter Notebooks in this section! The image below shows the methods implemented in each notebook. Following that is a brief text description, but open the notebooks themselves to see all steps your yourself.

For installation instructions, refer back to the [landing page](https://github.com/lou-a/FROSTBYTE). Test data has been included for a sample catchment in Canada and in the USA.
For installation instructions, refer back to the [landing page](https://github.com/CH-Earth/FROSTBYTE). Test data has been included for a sample catchment in Canada and in the USA.


<p align="center">
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Bottleneck==1.3.2
CRPS==2.0.4
geopandas==0.10.2
ipykernel==5.1.4
matplotlib==3.1.3
Expand Down
47 changes: 34 additions & 13 deletions scripts/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
############################################################################################################

# Import required modules
import CRPS.CRPS as CRPSscore
import datetime
from datetime import timedelta, date
import geopandas as gpd
Expand Down Expand Up @@ -318,7 +319,7 @@ def continuous_rank_prob_score(Qobs, Qfc_ens, min_obs):
CRPS range: 0 to +Inf. Perfect score: 0. Units: Same as variable measured.
CRPSS range: -Inf to 1. Perfect score: 1. Units: Unitless.
Characteristics: It is equivalent to the mean absolute error (MAE) for deterministic forecasts.
For more info, see the Python CRPS package documentation: https://pypi.org/project/properscoring/
For more info, see the relevant Python CRPS package documentation: https://pypi.org/project/properscoring/, https://pypi.org/project/CRPS/
Keyword arguments:
------------------
Expand All @@ -328,8 +329,8 @@ def continuous_rank_prob_score(Qobs, Qfc_ens, min_obs):
Returns:
--------
- CRPS: Float of the CRPS value between the ensemble forecasts & observations.
- CRPSS: Float of the CRPSS value between the ensemble forecasts & observations.
- fairCRPSS: Float of the fairCRPSS value between the ensemble forecasts & observations.
"""

Expand All @@ -344,11 +345,23 @@ def continuous_rank_prob_score(Qobs, Qfc_ens, min_obs):
CRPS_baseline = ps.crps_ensemble(Qobs, baseline).mean()
CRPSS = 1 - CRPS / CRPS_baseline

# Calculate the fairCRPS and fairCRPSS
fairCRPS = []
fairCRPS_baseline = []
for y in range(len(Qobs)):
fcrps = CRPSscore(Qfc_ens[y,:].values, Qobs[y].values).compute()[1]
fcrps_baseline = CRPSscore(baseline[y,:][~np.isnan(baseline[y,:])].tolist(), Qobs[y].values).compute()[1]
fairCRPS.append(fcrps)
fairCRPS_baseline.append(fcrps_baseline)
fairCRPS = np.mean(fcrps)
fairCRPS_baseline = np.mean(fairCRPS_baseline)
fairCRPSS = 1 - fairCRPS / fairCRPS_baseline

else:

CRPS, CRPSS = np.nan, np.nan
CRPSS, fairCRPSS = np.nan, np.nan

return CRPS, CRPSS
return CRPSS, fairCRPSS

###

Expand Down Expand Up @@ -1282,7 +1295,7 @@ def principal_component_analysis(stations_data, flag):

def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, perc_event_high, min_obs, bins_thresholds):

"""Calculates deterministic metrics for whole hindcast timeseries (1 value per hindcast start date & target period).
"""Calculates probabilistic metrics for whole hindcast timeseries (1 value per hindcast start date & target period).
Keyword arguments:
------------------
Expand All @@ -1297,8 +1310,8 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
Returns:
--------
- crps_da: xarray DataArray containing the CRPS for each hindcast start date & target period.
- crpss_da: xarray DataArray containing the CRPSS for each hindcast start date & target period.
- fair_crpss_da: xarray DataArray containing the fairCRPSS for each hindcast start date & target period.
- reli_da: xarray DataArray containing the reliability index for each hindcast start date & target period.
- roc_auc_da: xarray DataArray containing the ROC area under the curve for each hindcast start date & target period.
- roc_da: xarray DataArray containing the ROC curves for each hindcast start date & target period.
Expand All @@ -1312,11 +1325,13 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
# Initialize the verification metrics' Numpy arrays
if flag == 0:
crpss_array = np.ones((len(initdates),len(targetperiods))) * np.nan
fair_crpss_array = np.ones((len(initdates),len(targetperiods))) * np.nan
reli_array = np.ones((len(initdates),len(targetperiods))) * np.nan
roc_auc_array = np.ones((len(initdates),len(targetperiods),2)) * np.nan
roc_array = np.ones((len(initdates),len(targetperiods),2,11,2)) * np.nan
elif flag == 1:
crpss_array = np.ones((len(initdates),len(targetperiods),niterations)) * np.nan
fair_crpss_array = np.ones((len(initdates),len(targetperiods),niterations)) * np.nan
reli_array = np.ones((len(initdates),len(targetperiods),niterations)) * np.nan
roc_auc_array = np.ones((len(initdates),len(targetperiods),niterations,2)) * np.nan
roc_array = np.ones((len(initdates),len(targetperiods),niterations,2,11,2)) * np.nan
Expand Down Expand Up @@ -1351,7 +1366,8 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
if flag == 0:
# CRPS & CRPSS
crps_outputs = continuous_rank_prob_score(Qobs_data, Qfc_ens_data, min_obs)
crpss_array[row,column] = round(crps_outputs[1],2)
crpss_array[row,column] = round(crps_outputs[0],2)
fair_crpss_array[row,column] = round(crps_outputs[1],2)
# Reliability index
reli_array[row,column] = round(reli_index(Qobs_data, Qfc_ens_data, min_obs),2)
# ROC
Expand All @@ -1373,7 +1389,8 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p

# CRPS & CRPSS
crps_outputs = continuous_rank_prob_score(Qobs_data_bs, Qfc_ens_data_bs, min_obs)
crpss_array[row,column,b] = round(crps_outputs[1],2)
crpss_array[row,column,b] = round(crps_outputs[0],2)
fair_crpss_array[row,column,b] = round(crps_outputs[1],2)
# Reliability index
reli_array[row,column,b] = round(reli_index(Qobs_data_bs, Qfc_ens_data_bs, min_obs),2)
# ROC
Expand All @@ -1386,25 +1403,29 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p

# Save values to xarray DataArrays
if flag == 0:
crpss_da = xr.DataArray(data=crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_month','target_period'], name='CRPSS')
reli_da = xr.DataArray(data=reli_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_month','target_period'], name='Reliability_index')
crpss_da = xr.DataArray(data=crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_date','target_period'], name='CRPSS')
fair_crpss_da = xr.DataArray(data=fair_crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_date','target_period'], name='fairCRPSS')
reli_da = xr.DataArray(data=reli_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods]}, dims=['init_date','target_period'], name='Reliability_index')
roc_auc_da = xr.DataArray(data=roc_auc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','event'], name='ROC_AUC')
roc_da = xr.DataArray(data=roc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'rate':['FAR','HR'],'bins':roc_outputs_high[0].bins,'event':[perc_event_low, perc_event_high]}, dims=['init_month','target_period','rate','bins','event'], name='ROC')
roc_da = xr.DataArray(data=roc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'rate':['FAR','HR'],'bins':roc_outputs_high[0].bins,'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','rate','bins','event'], name='ROC')

elif flag == 1:
crpss_da = xr.DataArray(data=crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1)}, dims=['init_date','target_period','iteration'], name='CRPSS')
fair_crpss_da = xr.DataArray(data=fair_crpss_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1)}, dims=['init_date','target_period','iteration'], name='fairCRPSS')
reli_da = xr.DataArray(data=reli_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1)}, dims=['init_date','target_period','iteration'], name='Reliability_index')
roc_auc_da = xr.DataArray(data=roc_auc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1),'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','iteration','event'], name='ROC_AUC')
roc_da = xr.DataArray(data=roc_array, coords={'init_date':initdates,'target_period':[x[4::] for x in targetperiods],'iteration':np.arange(1,niterations+1),'rate':['FAR','HR'],'bins':roc_outputs_high[0].bins,'event':[perc_event_low, perc_event_high]}, dims=['init_date','target_period','iteration','rate','bins','event'], name='ROC')


# Information for the output xarray DataArrays
da_dict = {'CRPSS':crpss_da,'reli':reli_da,'ROC_AUC':roc_auc_da,'ROC':roc_da}
da_dict = {'CRPSS':crpss_da,'fairCRPSS':fair_crpss_da,'reli':reli_da,'ROC_AUC':roc_auc_da,'ROC':roc_da}
metrics_longnames_dict = {'CRPSS':'Continuous Rank Probability Skill Score',
'fairCRPSS':'Fair Continuous Rank Probability Skill Score',
'reli':'Reliability index',
'ROC_AUC':'Relative Operating Characteristic (ROC) area under the curve (AUC)',
'ROC':'Relative Operating Characteristic (ROC)'}
metrics_info_dict = {'CRPSS':'Measures the skill of the hindcast against a baseline (observations climatology). Range: -Inf to 1. Perfect score: 1. Units: Unitless.',
'fairCRPSS':'Measures the skill of the hindcast against a baseline (observations climatology), using a fair method to account for differences in ensemble sizes. Range: -Inf to 1. Perfect score: 1. Units: Unitless.',
'reli':'Measures the closeness between the empirical CDF of the ensemble hindcast with the CDF of a uniform distribution (i.e., flat rank histogram). Range: 0 to 1. Perfect score: 1. Units: Unitless.',
'ROC_AUC':'Measures the ensemble hindcast resolution, its ability to discriminate between events (given percentile) & non-events. ROC AUC range: 0 to 1,. Perfect score: 1. No skill: 0.5. Units: Unitless.',
'ROC':'Measures the ensemble hindcast resolution, its ability to discriminate between events (given percentile) & non-events. The ROC curve plots the hit rate (HR) vs the false alarm rate (FAR) using a set of increasing probability thresholds (i.e., 0.1, 0.2, ..., 1) to make the yes/no decision.'}
Expand All @@ -1430,7 +1451,7 @@ def prob_metrics_calculation(Qobs, Qfc_ens, flag, niterations, perc_event_low, p
da_dict[keys].bins.attrs['info'] = 'Forecast probability thresholds used for the ROC calculations.'
da_dict[keys].rate.attrs['info'] = 'The false alarm rate (FAR) captures when an event is forecast to occur, but did not occur. The hite rate (HR) captures when an event is forecast to occur, and did occur.'

return crpss_da, reli_da, roc_auc_da, roc_da
return crpss_da, fair_crpss_da, reli_da, roc_auc_da, roc_da

###

Expand Down
2 changes: 1 addition & 1 deletion settings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ The settings for running the data-driven forecasting workflow are located in thi

## Instructions

Copy an existing settings file, and update to match the directory paths for your own environment, for the basin of interest, input data and output paths.
Copy an existing settings file, and update to match the directory paths for your own environment, for the basin of interest, input data and output paths. Data are provided for two river basins in the `test_case_data` folder, corresponding to the Bow River at Banff in Alberta, Canada (05BB001), and the Crystal River Above Avalanche Creek, Near Redstone in Colorado, USA (09081600).
2 changes: 1 addition & 1 deletion settings/config_test_case.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Configuration file for data-driven forecasting
# Set required data paths - these are relative paths to where this script is stored

# Domain, note that the name needs to match the data paths that follow
# Domain, note that the name needs to match the data paths that follow - current options are: "05BB001" for the Bow River at Banff in Alberta, Canada, or "09081600" for the Crystal River Above Avalanche Creek, Near Redstone in Colorado, USA
domain: "05BB001"

# Observational data path
Expand Down
2 changes: 1 addition & 1 deletion test_case_data/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Test Case Data

Sample data for running the forecasting workflow for two single river basins: the Bow River at Banff in Alberta, Canada, and the Crystal River Abv Avalanche Crk, Near Redstone in Colorado, USA. The locations of both are shown in the image below.
Sample data for running the forecasting workflow for two single river basins: the Bow River at Banff in Alberta, Canada (05BB001), and the Crystal River Above Avalanche Creek, Near Redstone in Colorado, USA (09081600). The locations of both are shown in the image below.

<p align="center">
<img src="TestBasins.png" alt="Test basins" width="500"/>
Expand Down

0 comments on commit dc5e735

Please sign in to comment.