epiforecasts · nikosbosse · Jan 18, 2023 · Jan 18, 2023 · Jan 18, 2023 · Jan 18, 2023
diff --git a/NEWS.md b/NEWS.md
@@ -9,6 +9,7 @@ A minor update to the package with some bug fixes and minor changes.
 - Removed the on attach message which warned of breaking changes in `1.0.0`.
 - Renamed the `metric` argument of `summarise_scores()` to `relative_skill_metric`. This argument is now deprecated and will be removed in a future version of the package. Please use the new argument instead.
 - Updated the documentation for `score()` and related functions to make the soft requirement for a `model` column in the input data more explicit.
+- Updated the documentation for `score()`, `pairwise_comparison()` and `summaris_scores()` to make it clearer what the unit of a single forecast is that is required for computations
 - Simplified the function `plot_pairwise_comparison()` which now only supports plotting mean score ratios or p-values and removed the hybrid option to print both at the same time. 
 
 ## Bug fixes

diff --git a/R/check_forecasts.R b/R/check_forecasts.R
@@ -278,6 +278,7 @@ print.scoringutils_check <- function(x, ...) {
 #'
 #' @param forecast_unit A character vector with the column names that define
 #' the unit of a single forecast. If missing the function tries to infer the
+#' unit of a single forecast.
 #'
 #' @param ... Additional arguments passed to [get_forecast_unit()].
 #' @return A data.frame with all rows for which a duplicate forecast was found

diff --git a/R/pairwise-comparisons.R b/R/pairwise-comparisons.R
@@ -2,9 +2,26 @@
 #'
 #' @description
 #'
-#' Make pairwise comparisons between models. The code for the pairwise
-#' comparisons is inspired by an implementation by Johannes Bracher.
+#' Compute relative scores between different models making pairwise
+#' comparisons. Pairwise comparisons are a sort of pairwise tournament where all
+#' combinations of two models are compared against each other based on the
+#' overlapping set of available forecasts common to both models.
+#' Internally, a ratio of the mean scores of both models is computed.
+#' The relative score of a model is then the geometric mean of all mean score
+#' ratios which involve that model. When a baseline is provided, then that
+#' baseline is excluded from the relative scores for individual models
+#' (which therefore differ slightly from relative scores without a baseline)
+#' and all relative scores are scaled by (i.e. divided by) the relative score of
+#' the baseline model.
+#' Usually, the function input should be unsummarised scores as
+#' produced by [score()].
+#' Note that the function internally infers the *unit of a single forecast* by
+#' determining all columns in the input that do not correspond to metrics
+#' computed by [score()]. Adding unrelated columns will change results in an
+#' unpredictable way.
 #'
+#' The code for the pairwise comparisons is inspired by an implementation by
+#' Johannes Bracher.
 #' The implementation of the permutation test follows the function
 #' `permutationTest` from the `surveillance` package by Michael Höhle,
 #' Andrea Riebler and Michaela Paul.

diff --git a/R/score.R b/R/score.R
@@ -13,13 +13,26 @@
 #' each format are also provided (see the documentation for `data` below or in
 #' [check_forecasts()]).
 #'
-#' To obtain a quick overview of the currrently supported evaluation metrics,
+#' Each format has a set of required columns (see below). Additional columns may
+#' be present to indicate a grouping of forecasts. For example, we could have
+#' forecasts made by different models in various locations at different time
+#' points, each for several weeks into the future. It is important, that there
+#' are only columns present which are relevant in order to group forecasts.
+#' A combination of different columns should uniquely define the
+#' *unit of a single forecast*, meaning that a single forecast is defined by the
+#' values in the other columns. Adding additional unrelated columns may alter
+#' results.
+#'
+#' To obtain a quick overview of the currently supported evaluation metrics,
 #' have a look at the [metrics] data included in the package. The column
 #' `metrics$Name` gives an overview of all available metric names that can be
 #' computed. If interested in an unsupported metric please open a [feature
 #' request](https://github.com/epiforecasts/scoringutils/issues) or consider
 #' contributing a pull request.
 #'
+#' For additional help and examples, check out the [Getting Started
+#' Vignette](https://epiforecasts.io/scoringutils/articles/getting-started.html).
+#'
 #' @param data A data.frame or data.table with the predictions and observations.
 #' For scoring using [score()], the following columns need to be present:
 #' \itemize{

diff --git a/R/summarise_scores.R b/R/summarise_scores.R
@@ -6,9 +6,14 @@
 #' @inheritParams score
 #' @param by character vector with column names to summarise scores by. Default
 #' is `NULL`, meaning that the only summary that takes is place is summarising
-#' over quantiles (in case of quantile-based forecasts), such that there is one
-#' score per forecast as defined by the unit of a single forecast (rather than
-#' one score for every quantile).
+#' over samples or quantiles (in case of quantile-based forecasts), such that
+#' there is one score per forecast as defined by the *unit of a single forecast*
+#' (rather than one score for every sample or quantile).
+#' The *unit of a single forecast* is determined by the columns present in the
+#' input data that do not correspond to a metric produced by [score()], which
+#' indicate indicate a grouping of forecasts (for example there may be one
+#' forecast per day, location and model). Adding additional, unrelated, columns
+#' may alter results in an unpredictable way.
 #' @param fun a function used for summarising scores. Default is `mean`.
 #' @param relative_skill logical, whether or not to compute relative
 #' performance between models based on pairwise comparisons.
@@ -33,6 +38,9 @@
 #' @examples
 #' library(magrittr) # pipe operator
 #'
+#' scores <- score(example_continuous)
+#' summarise_scores(scores)
+#'
 #' # summarise over samples or quantiles to get one score per forecast
 #' scores <- score(example_quantile)
 #' summarise_scores(scores)

diff --git a/man/check_summary_params.Rd b/man/check_summary_params.Rd
diff --git a/man/find_duplicates.Rd b/man/find_duplicates.Rd
diff --git a/man/pairwise_comparison.Rd b/man/pairwise_comparison.Rd
diff --git a/man/score.Rd b/man/score.Rd
diff --git a/man/summarise_scores.Rd b/man/summarise_scores.Rd