Feature: add enhanced treatment of sample as a protected column #261

seabbs · 2023-01-12T21:54:28Z

This PR adds support for scoring quantile forecasts that have a sample column. It closes #242. Whilst I've added some testing to protect against issues it may be the case that the "protected" column assumptions are baked into places I have missed and so this is still dangerous.

See the following example for the new functionality:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(scoringutils)

n_sim <- 1000
epsilon <- rnorm(n_sim)
Y <- exp(epsilon)

forecasts <- expand.grid(
  sigma = 1:20/10, 
  quantile = c(0.01, 0.025, 1:19/20, 0.975, 0.99)
)

forecasts <- forecasts |>
  as_tibble() |>
  mutate(model = 10 * sigma,
         prediction = exp(qnorm(quantile, sd = sigma)),
         true_value = list(Y),
         sample = list(1:length(Y))) |>
  unnest(c(true_value, sample))

check_forecasts(forecasts)
#> Your forecasts seem to be for a target of the following type:
#> $target_type
#> [1] "continuous"
#> 
#> and in the following format:
#> $prediction_type
#> [1] "quantile"
#> 
#> The unit of a single forecast is defined by:
#> $forecast_unit
#> [1] "sigma"  "model"  "sample"
#> 
#> Cleaned data, rows with NA values in prediction or true_value removed:
#> $cleaned_data
#>         sigma quantile model  prediction true_value sample
#>         <num>    <num> <num>       <num>      <num>  <int>
#>      1:   0.1     0.01     1   0.7924429  0.8022296      1
#>      2:   0.1     0.01     1   0.7924429  0.4194004      2
#>      3:   0.1     0.01     1   0.7924429  0.7896071      3
#>      4:   0.1     0.01     1   0.7924429  0.5963944      4
#>      5:   0.1     0.01     1   0.7924429  0.6159169      5
#>     ---                                                   
#> 459996:   2.0     0.99    20 104.8673007  0.6357381    996
#> 459997:   2.0     0.99    20 104.8673007  0.3471260    997
#> 459998:   2.0     0.99    20 104.8673007  0.1470999    998
#> 459999:   2.0     0.99    20 104.8673007  0.3589190    999
#> 460000:   2.0     0.99    20 104.8673007  1.2826372   1000
#> 
#> Number of unique values per column per model:
#> $unique_values
#>     model sigma quantile prediction true_value sample
#>     <num> <int>    <int>      <int>      <int>  <int>
#>  1:     1     1       23         23       1000   1000
#>  2:     2     1       23         23       1000   1000
#>  3:     3     1       23         23       1000   1000
#>  4:     4     1       23         23       1000   1000
#>  5:     5     1       23         23       1000   1000
#>  6:     6     1       23         23       1000   1000
#>  7:     7     1       23         23       1000   1000
#>  8:     8     1       23         23       1000   1000
#>  9:     9     1       23         23       1000   1000
#> 10:    10     1       23         23       1000   1000
#> 11:    11     1       23         23       1000   1000
#> 12:    12     1       23         23       1000   1000
#> 13:    13     1       23         23       1000   1000
#> 14:    14     1       23         23       1000   1000
#> 15:    15     1       23         23       1000   1000
#> 16:    16     1       23         23       1000   1000
#> 17:    17     1       23         23       1000   1000
#> 18:    18     1       23         23       1000   1000
#> 19:    19     1       23         23       1000   1000
#> 20:    20     1       23         23       1000   1000
#>     model sigma quantile prediction true_value sample
scores <- score(forecasts)
summarise_scores(scores, by = "sample")
#>       sample interval_score dispersion underprediction overprediction
#>        <int>          <num>      <num>           <num>          <num>
#>    1:      1      0.3549503  0.3302979      0.00000000     0.02465237
#>    2:      2      0.5136531  0.3302979      0.00000000     0.18335521
#>    3:      3      0.3580064  0.3302979      0.00000000     0.02770844
#>    4:      4      0.4237402  0.3302979      0.00000000     0.09344226
#>    5:      5      0.4155050  0.3302979      0.00000000     0.08520709
#>   ---                                                                
#>  996:    996      0.4075680  0.3302979      0.00000000     0.07727007
#>  997:    997      0.5581558  0.3302979      0.00000000     0.22785787
#>  998:    998      0.7055681  0.3302979      0.00000000     0.37527021
#>  999:    999      0.5505211  0.3302979      0.00000000     0.22022319
#> 1000:   1000      0.3718230  0.3302979      0.04152506     0.00000000
#>       coverage_deviation    bias ae_median
#>                    <num>   <num>     <num>
#>    1:        0.206086957  0.3140 0.1977704
#>    2:       -0.141739130  0.6815 0.5805996
#>    3:        0.188695652  0.3300 0.2103929
#>    4:        0.006086957  0.5275 0.4036056
#>    5:        0.027826087  0.5050 0.3840831
#>   ---                                     
#>  996:        0.053913043  0.4790 0.3642619
#>  997:       -0.211304348  0.7515 0.6528740
#>  998:       -0.385217391  0.9090 0.8529001
#>  999:       -0.198260870  0.7365 0.6410810
#> 1000:        0.184347826 -0.3350 0.2826372

^{Created on 2023-01-12 with reprex v2.0.2}

codecov · 2023-01-12T22:06:18Z

Codecov Report

Merging #261 (e1c2090) into master (43b3394) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #261      +/-   ##
==========================================
+ Coverage   91.36%   91.39%   +0.03%     
==========================================
  Files          21       21              
  Lines        1366     1371       +5     
==========================================
+ Hits         1248     1253       +5     
  Misses        118      118

Impacted Files	Coverage Δ
R/check_forecasts.R	`87.50% <100.00%> (+0.11%)`	⬆️
R/summarise_scores.R	`89.74% <100.00%> (+0.13%)`	⬆️
R/utils.R	`88.88% <100.00%> (+0.65%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

nikosbosse · 2023-01-13T12:41:24Z

Nice thanks a lot! So essentially this internally checks whether the prediction type is quantile, and if it is then it removes "sample" from the list of protected columns, right?

What do you think about the additional (alternative?) feature that it would give a message / warning when you run check_forecasts() and have a protected column there?

seabbs · 2023-01-13T14:53:30Z

So essentially this internally checks whether the prediction type is quantile, and if it is then it removes "sample" from the list of protected columns, right?

Yes exactly

What do you think about the additional (alternative?) feature that it would give a message / warning when you run check_forecasts() and have a protected column there?

I am not sure why you would want to do that? Unless it offers safety elsewhere in your code it seems overly restrictive.

I'm totally open to either so can either merge this in or close out and flag the desired implementation in the original issue.

nikosbosse · 2023-01-16T14:21:02Z

Merci!

seabbs requested a review from nikosbosse January 12, 2023 22:17

seabbs added the enhancement New feature or request label Jan 12, 2023

seabbs marked this pull request as ready for review January 12, 2023 22:18

nikosbosse approved these changes Jan 13, 2023

View reviewed changes

seabbs added 4 commits January 13, 2023 14:58

add enhanced treatment of sample as a protected column

0c339c2

make default arg internal and update docs

3d619c0

add tests for new support of sample column

b68454a

update news

f75e180

seabbs force-pushed the seabbs/issue242 branch from e1f66d5 to f75e180 Compare January 13, 2023 14:59

cleaned up news

e1c2090

seabbs requested a review from nikosbosse January 13, 2023 19:28

nikosbosse merged commit 55d184c into master Jan 16, 2023

nikosbosse deleted the seabbs/issue242 branch January 16, 2023 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: add enhanced treatment of sample as a protected column #261

Feature: add enhanced treatment of sample as a protected column #261

Uh oh!

seabbs commented Jan 12, 2023 •

edited

Loading

Uh oh!

codecov bot commented Jan 12, 2023 •

edited

Loading

Uh oh!

nikosbosse commented Jan 13, 2023

Uh oh!

seabbs commented Jan 13, 2023 •

edited

Loading

Uh oh!

nikosbosse commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature: add enhanced treatment of sample as a protected column #261

Feature: add enhanced treatment of sample as a protected column #261

Uh oh!

Conversation

seabbs commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nikosbosse commented Jan 13, 2023

Uh oh!

seabbs commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikosbosse commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seabbs commented Jan 12, 2023 •

edited

Loading

codecov bot commented Jan 12, 2023 •

edited

Loading

seabbs commented Jan 13, 2023 •

edited

Loading