seastats is a simple package to compare and analyse 2 time series. We use the following convention in this repo:
sim: modelled surge time seriesmod: observed surge time series
The main function is:
def get_stats(
sim: Series,
obs: Series,
metrics: Sequence[str] = SUGGESTED_METRICS,
quantile: float = 0,
cluster: int = 24,
round: int = -1
) -> dict[str, float]Which calculates various statistical metrics between the simulated and observed time series data.
get all metrics in a 3 liner:
from seastats import get_stats, GENERAL_METRICS, STORM_METRICS
general = get_stats(sim, obs, metrics = GENERAL_METRICS)
storm = get_stats(sim, obs, quantile = 0.99, metrics = STORM_METRICS) # ! we use a different quantile for PoT selection
pd.DataFrame(dict(general, **storm), index=['abed'])this returns:
| station | bias | rms | rmse | cr | nse | kge | R1 | R3 | error |
|---|---|---|---|---|---|---|---|---|---|
| abed | -0.007 | 0.086 | 0.086 | 0.817 | 0.677 | 0.81 | 0.237364 | 0.147163 | 0.0938142 |
pip install seastats
mamba install -c conda-forge seastats
- sim (pd.Series). The simulated time series data.
- obs (pd.Series). The observed time series data.
- metrics (list[str]). (Optional) The list of statistical metrics to calculate. If metrics = ["all"], all items in
SUPPORTED_METRICSwill be calculated. Default is all items inSUGGESTED_METRICS. - quantile (float). (Optional) Quantile used to calculate the metrics. Default is
0(no selection) - cluster (int). (Optional) Cluster duration for grouping storm events. Default is
72hours. - round (int). (Optional) Apply rounding to the results to. Default is no rounding (value is
-1)
Returns a dictionary containing the calculated metrics and their corresponding values. With 2 types of metrics:
- The "general" metrics: All the basic metrics needed for signal comparison (RMSE, RMS, Correlation etc..). See details below
bias: Biasrmse: Root Mean Square Errorrms: Root Mean Squarerms_95: Root Mean Square for data points above 95th percentilesim_mean: Mean of simulated valuesobs_mean: Mean of observed valuessim_std: Standard deviation of simulated valuesobs_std: Standard deviation of observed valuesmae: Mean Absolute Errormse: Mean Square Errornse: Nash-Sutcliffe Efficiencylambda: Lambda indexcr: Pearson Correlation coefficientcr_95: Pearson Correlation coefficient for data points above 95th percentileslope: Slope of Model/Obs correlationintercept: Intercept of Model/Obs correlationslope_pp: Slope of Model/Obs correlation of percentilesintercept_pp: Intercept of Model/Obs correlation of percentilesmad: Mean Absolute Deviationmadp: Mean Absolute Deviation of percentilesmadc:mad + madpkge: Kling–Gupta Efficiency
- The storm metrics: a PoT selection is done on the observed signal (using the
match_extremes()function). Function returns the decreasing extreme event peak values for observed and modeled signals (and time lag between events).R1: Difference between observed and modelled for the biggest stormR1_abs: Absolute difference between observed and modelled for the biggest stormR1_abs_norm: Absolute normalized difference between observed and modelled for the biggest stormR3: Averaged difference between observed and modelled for the three biggest stormsR3_abs: Averaged absolute difference between observed and modelled for the three biggest stormsR3_abs_norm: Average of the normalized absolute difference between observed and modelled for the three biggest stormserror: Averaged difference between modelled values and observed detected stormsabs_error: Averaged absolute difference between modelled values and observed detected stormsabs_error_norm: Average of the normalized absolute difference between modelled values and observed detected storms
Performance Scores (PS) or Nash-Sutcliffe Eff (NSE): $$1 - \frac{\langle (x_c - x_m)^2 \rangle}{\langle (x_m - x_R)^2 \rangle}$$
-
rthe correlation -
bthe modified bias term (see ref)$$\frac{\langle x_c \rangle - \langle x_m \rangle}{\sigma_m}$$ -
gthe std dev term$$\frac{\sigma_c}{\sigma_m}$$
- with
kappa$$2 \cdot \left| \sum{((x_m - \overline{x}_m) \cdot (x_c - \overline{x}_c))} \right|$$
The storm_metrics() might return:
{'R1': np.nan,
'R1_abs': np.nan,
'R1_abs_norm': np.nan,
'R3': np.nan,
'R3_abs': np.nan,
'R3_abs_norm': np.nan,
'error': np.nan,
'abs_error': np.nan,
'abs_error_norm': np.nan}Example of implementation:
from seastats.storms import match_extremes
extremes_df = match_extremes(sim, obs, 0.99, cluster = 72)
extremes_dfThe modeled peaks are matched with the observed peaks. Function returns a pd.DataFrame of the decreasing observed storm peaks as follows:
| time observed | observed | time observed | model | time model | error | abs_error | abs_error_norm | tdiff |
|---|---|---|---|---|---|---|---|---|
| 2022-01-29 19:30:00 | 0.803 | 2022-01-29 19:30:00 | 0.565 | 2022-01-29 17:00:00 | -0.237 | 0.237 | 0.296 | -2.5 |
| 2022-02-20 20:30:00 | 0.639 | 2022-02-20 20:30:00 | 0.577 | 2022-02-20 20:00:00 | -0.062 | 0.062 | 0.0963 | -0.5 |
| ... | ||||||||
| 2022-11-27 15:30:00 | 0.386 | 2022-11-27 15:30:00 | 0.400 | 2022-11-27 17:00:00 | 0.014 | 0.014 | 0.036 | 1.5 |
with:
diffthe difference between modeled and observed peakserrorthe absolute difference between modeled and observed peakstdiffthe time difference between modeled and observed peaks
NB: the function uses pyextremes in the background, with PoT method, using the quantile value of the observed signal as physical threshold and passes the cluster_duration argument.
this happens when the function storms/match_extremes.py couldn't find concomitent storms for the observed and modeled time series.
see notebook for details