Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TheilSen trend statistics #384

Open
raffaele-morelli opened this issue May 20, 2024 · 1 comment
Open

TheilSen trend statistics #384

raffaele-morelli opened this issue May 20, 2024 · 1 comment
Labels
question Questions about function use or interpretation

Comments

@raffaele-morelli
Copy link

raffaele-morelli commented May 20, 2024

Question

Hi,

I am working on data with missing months:

  • 2018-09
  • 2018-01
  • 2018-12
  • 2019-01
    as shown in TheilSen plot.

image

Looking at MKresults$data[[2]] we see a table with two lines, one referring to 2019-06-27 and the other to 2018-11-08.

default p.stars date conc a b upper.a upper.b lower.a lower.b p slope intercept intercept.lower intercept.upper lower upper slope.percent lower.percent upper.percent
default ** 2019-06-27 16.11156 55.92575 -0.0023145 28.85148 -0.0007367 86.20022 -0.0040051 0.0033389 -0.8447789 55.92575 86.20022 28.85148 -1.461859 -0.2688846 -4.957488 -7.730342 -1.632108
default NA 2018-11-08 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -4.957488 -7.730342 -1.632108

Why two lines with all NaN except for slope.percent lower.percent upper.percent ?

Data around november 2018 follows

obs date meteo v_norm
11 2018-08-14 12.686618 -1.686618
7 2018-08-26 10.537685 -3.537685
10 2018-10-29 12.546518 -2.546518
14 2018-10-30 12.760691 1.239309
55 2019-02-28 18.897843 36.102157
34 2019-03-01 13.481293 20.518707
42 2019-03-02 16.437182 25.562818
35 2019-03-03 16.784798 18.215202
14 2019-03-04 12.479097 1.520903
6 2019-03-05 10.300994 -4.300993
4 2019-03-06 8.724356 -4.724356
7 2019-03-07 9.382038 -2.382038
6 2019-03-08 10.920722 -4.920722
5 2019-03-09 12.816521 -7.816521

Regards

@raffaele-morelli raffaele-morelli added the question Questions about function use or interpretation label May 20, 2024
@raffaele-morelli raffaele-morelli changed the title TheilSen TheilSen trend statistics May 20, 2024
@mooibroekd
Copy link

Reprex to confirm:

library(openair) 
mary <- importAURN(site = "my1", year = c(seq(2000, 2009, 1), seq(2011, 2019, 1)))
result <- TheilSen(mary, pollutant = "no2")
#> Taking bootstrap samples. Please wait.

result$data[[2]]
#> # A tibble: 2 × 20
#>   default p.stars date        conc     a         b upper.a   upper.b lower.a
#>   <chr>   <chr>   <date>     <dbl> <dbl>     <dbl>   <dbl>     <dbl>   <dbl>
#> 1 default ***     2009-12-06  94.7  137.  -0.00305    118.  -0.00182    158.
#> 2 default <NA>    2010-06-16 NaN    NaN  NaN          NaN  NaN          NaN 
#> # ℹ 11 more variables: lower.b <dbl>, p <dbl>, slope <dbl>, intercept <dbl>,
#> #   intercept.lower <dbl>, intercept.upper <dbl>, lower <dbl>, upper <dbl>,
#> #   slope.percent <dbl>, lower.percent <dbl>, upper.percent <dbl>

Created on 2024-08-07 with reprex v2.1.1

However, you can see that the slope.percent, lower.percent and upper.percent are the same as the initial line. I suspect this is a bug when combining the data output, in the sense that those percentages are added twice. When calculating a single line, I would explicitly target the first results (i.e. result$data[[2]][1,])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions about function use or interpretation
Projects
None yet
Development

No branches or pull requests

2 participants