Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atime_versions changes regular data.table #30

Closed
tdhock opened this issue Feb 14, 2024 · 4 comments
Closed

atime_versions changes regular data.table #30

tdhock opened this issue Feb 14, 2024 · 4 comments

Comments

@tdhock
Copy link
Owner

tdhock commented Feb 14, 2024

bmerge.c in version https://github.com/Rdatatable/data.table/blob/e40456fef569d1153fbe203f4fb97dfa4377f205/src/bmerge.c#L447 has

  case EQ:
    if (ilow>ilowIn && (xlow>xlowIn || isRollCol)) {
      Rprintf("bmerge_r1 %d %d %d %d\n", xlowIn, xlow+1, ilowIn, ilow+1);
      bmerge_r(xlowIn, xlow+1, ilowIn, ilow+1, col, 1, lowmax, uppmax && xlow+1==xuppIn);
    }
    if (iupp<iuppIn && (xupp<xuppIn || isRollCol)) {
      Rprintf("bmerge_r2 %d %d %d %d\n", xupp-1, xuppIn, iupp-1, iuppIn);
      bmerge_r(xupp-1, xuppIn, iupp-1, iuppIn, col, 1, lowmax && xupp-1==xlowIn, uppmax);
    }

which is debugging output.
When we run the code below we get that debugging output, but we should not, why?

library(atime)
library(ggplot2)
library(data.table)
tdir <- "~/R/data.table"
expr.list <- atime::atime_versions_exprs(
  pkg.path=tdir,
  pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
    pkg_find_replace <- function(glob, FIND, REPLACE){
      atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
    }
    Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
    Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
    new.Package_ <- paste0(Package_, "_", sha)
    pkg_find_replace(
      "DESCRIPTION", 
      paste0("Package:\\s+", old.Package),
      paste("Package:", new.Package))
    pkg_find_replace(
      file.path("src","Makevars.*in"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      sprintf('packageVersion\\("%s"\\)', old.Package),
      sprintf('packageVersion\\("%s"\\)', new.Package))
    pkg_find_replace(
      file.path("src", "init.c"),
      paste0("R_init_", Package_regex),
      paste0("R_init_", gsub("[.]", "_", new.Package_)))
    pkg_find_replace(
      "NAMESPACE",
      sprintf('useDynLib\\("?%s"?', Package_regex),
      paste0('useDynLib(', new.Package_))
  },
  expr=data.table:::`[.data.table`(x.dt, , price.adj := Value),
  ##"Before"="eb1a309e344d1a630cf03133cfa8148a6ab54eab",#https://github.com/Rdatatable/data.table/pull/5187/commits/eb1a309e344d1a630cf03133cfa8148a6ab54eab
  "Regression"="e40456fef569d1153fbe203f4fb97dfa4377f205") #https://github.com/Rdatatable/data.table/pull/5187/commits/e40456fef569d1153fbe203f4fb97dfa4377f205
atime::atime(
  expr.list=expr.list,
  N=10^seq(1,2),
  setup={ 
    setDTthreads(1)
    x.df <- data.frame(date = as.POSIXct("2000-1-1") + 1:N, price = rnorm(N))
    x.dt <- data.table(x.df, key = "date")
    y.df <- data.frame(date = sort(sample(x.df$date, N)), adjustment = runif(N))
    y.dt <- data.table(y.df, key = "date")
    Value = y.dt[x.dt, price+adjustment, roll=TRUE]
  })

I get the output below, which is unexpected because data.table CRAN release 1.15.0 (with no debugging output) is installed, yet the Value = y.dt[x.dt, price+adjustment, roll=TRUE] line gives the bmerge_1 etc debugging output:

> library(data.table)
data.table 1.15.0 using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
> tdir <- "~/R/data.table"
> . + . + > . + Registered S3 methods overwritten by 'data.table.e40456fef569d1153fbe203f4fb97dfa4377f205':
  method                   from      
  all.equal.data.table     data.table
  [.data.table             data.table
  [<-.data.table           data.table
  $<-.data.table           data.table
  print.data.table         data.table
  as.data.frame.data.table data.table
  as.list.data.table       data.table
  as.matrix.data.table     data.table
  cbind.data.table         data.table
  rbind.data.table         data.table
  dim.data.table           data.table
  dimnames.data.table      data.table
  dimnames<-.data.table    data.table
  names<-.data.table       data.table
  duplicated.data.table    data.table
  unique.data.table        data.table
  merge.data.table         data.table
  subset.data.table        data.table
  transform.data.table     data.table
  within.data.table        data.table
  is.na.data.table         data.table
  format.data.table        data.table
  Ops.data.table           data.table
  anyDuplicated.data.table data.table
  split.data.table         data.table
  tail.data.table          data.table
  head.data.table          data.table
  na.omit.data.table       data.table
  [.ITime                  data.table
  +.IDate                  data.table
  -.IDate                  data.table
  as.character.ITime       data.table
  as.data.frame.ITime      data.table
  as.Date.IDate            data.table
  as.list.IDate            data.table
  as.POSIXct.IDate         data.table
  as.POSIXct.ITime         data.table
  as.POSIXlt.ITime         data.table
  c.IDate                  data.table
  c.ITime                  data.table
  cut.IDate                data.table
  format.ITime             data.table
  mean.IDate               data.table
  mean.ITime               data.table
  print.ITime              data.table
  rep.IDate                data.table
  rep.ITime                data.table
  round.IDate              data.table
  round.ITime              data.table
  trunc.ITime              data.table
  seq.IDate                data.table
  seq.ITime                data.table
  unique.IDate             data.table
  unique.ITime             data.table
  [<-.IDate                data.table
  min.IDate                data.table
  max.IDate                data.table
  edit.data.table          data.table
  droplevels.data.table    data.table
bmerge_r0 48 50 48 50
bmerge_r1 -1 49 -1 49
bmerge_r0 23 25 23 25
bmerge_r1 -1 24 -1 24
bmerge_r0 10 12 10 12
bmerge_r1 -1 11 -1 11
bmerge_r0 4 6 4 6
bmerge_r1 -1 5 -1 5
bmerge_r0 1 3 1 3
bmerge_r1 -1 2 -1 2
bmerge_r0 -1 1 -1 1
bmerge_r2 0 2 0 2
bmerge_r0 0 2 0 2
bmerge_r2 2 5 2 5
bmerge_r0 2 4 2 4
bmerge_r2 3 5 3 5
bmerge_r0 3 5 3 5
bmerge_r2 5 11 5 11
bmerge_r0 7 9 7 9
bmerge_r1 5 8 5 8
bmerge_r0 5 7 5 7
bmerge_r2 6 8 6 8
bmerge_r0 6 8 6 8
bmerge_r2 8 11 8 11
bmerge_r0 8 10 8 10
bmerge_r2 9 11 9 11
bmerge_r0 9 11 9 11
bmerge_r2 11 24 11 24
bmerge_r0 16 18 16 18
bmerge_r1 11 17 11 17
bmerge_r0 13 15 13 15
bmerge_r1 11 14 11 14
bmerge_r0 11 13 11 13
bmerge_r2 12 14 12 14
bmerge_r0 12 14 12 14
bmerge_r2 14 17 14 17
bmerge_r0 14 16 14 16
bmerge_r2 15 17 15 17
bmerge_r0 15 17 15 17
bmerge_r2 17 24 17 24
bmerge_r0 19 21 19 21
bmerge_r1 17 20 17 20
bmerge_r0 17 19 17 19
bmerge_r2 18 20 18 20
bmerge_r0 18 20 18 20
bmerge_r2 20 24 20 24
bmerge_r0 21 23 21 23
bmerge_r1 20 22 20 22
bmerge_r0 20 22 20 22
bmerge_r2 22 24 22 24
bmerge_r0 22 24 22 24
bmerge_r2 24 49 24 49
bmerge_r0 35 37 35 37
bmerge_r1 24 36 24 36
bmerge_r0 29 31 29 31
bmerge_r1 24 30 24 30
bmerge_r0 26 28 26 28
bmerge_r1 24 27 24 27
bmerge_r0 24 26 24 26
bmerge_r2 25 27 25 27
bmerge_r0 25 27 25 27
bmerge_r2 27 30 27 30
bmerge_r0 27 29 27 29
bmerge_r2 28 30 28 30
bmerge_r0 28 30 28 30
bmerge_r2 30 36 30 36
bmerge_r0 32 34 32 34
bmerge_r1 30 33 30 33
bmerge_r0 30 32 30 32
bmerge_r2 31 33 31 33
bmerge_r0 31 33 31 33
bmerge_r2 33 36 33 36
bmerge_r0 33 35 33 35
bmerge_r2 34 36 34 36
bmerge_r0 34 36 34 36
bmerge_r2 36 49 36 49
bmerge_r0 41 43 41 43
bmerge_r1 36 42 36 42
bmerge_r0 38 40 38 40
bmerge_r1 36 39 36 39
bmerge_r0 36 38 36 38
bmerge_r2 37 39 37 39
bmerge_r0 37 39 37 39
bmerge_r2 39 42 39 42
bmerge_r0 39 41 39 41
bmerge_r2 40 42 40 42
bmerge_r0 40 42 40 42
bmerge_r2 42 49 42 49
bmerge_r0 44 46 44 46
bmerge_r1 42 45 42 45
bmerge_r0 42 44 42 44
bmerge_r2 43 45 43 45
bmerge_r0 43 45 43 45
bmerge_r2 45 49 45 49
bmerge_r0 46 48 46 48
bmerge_r1 45 47 45 47
bmerge_r0 45 47 45 47
bmerge_r2 47 49 47 49
bmerge_r0 47 49 47 49
bmerge_r2 49 100 49 100
bmerge_r0 73 75 73 75
bmerge_r1 49 74 49 74
bmerge_r0 60 62 60 62
bmerge_r1 49 61 49 61
bmerge_r0 54 56 54 56
bmerge_r1 49 55 49 55
bmerge_r0 51 53 51 53
bmerge_r1 49 52 49 52
bmerge_r0 49 51 49 51
bmerge_r2 50 52 50 52
bmerge_r0 50 52 50 52
bmerge_r2 52 55 52 55
bmerge_r0 52 54 52 54
bmerge_r2 53 55 53 55
bmerge_r0 53 55 53 55
bmerge_r2 55 61 55 61
bmerge_r0 57 59 57 59
bmerge_r1 55 58 55 58
bmerge_r0 55 57 55 57
bmerge_r2 56 58 56 58
bmerge_r0 56 58 56 58
bmerge_r2 58 61 58 61
bmerge_r0 58 60 58 60
bmerge_r2 59 61 59 61
bmerge_r0 59 61 59 61
bmerge_r2 61 74 61 74
bmerge_r0 66 68 66 68
bmerge_r1 61 67 61 67
bmerge_r0 63 65 63 65
bmerge_r1 61 64 61 64
bmerge_r0 61 63 61 63
bmerge_r2 62 64 62 64
bmerge_r0 62 64 62 64
bmerge_r2 64 67 64 67
bmerge_r0 64 66 64 66
bmerge_r2 65 67 65 67
bmerge_r0 65 67 65 67
bmerge_r2 67 74 67 74
bmerge_r0 69 71 69 71
bmerge_r1 67 70 67 70
bmerge_r0 67 69 67 69
bmerge_r2 68 70 68 70
bmerge_r0 68 70 68 70
bmerge_r2 70 74 70 74
bmerge_r0 71 73 71 73
bmerge_r1 70 72 70 72
bmerge_r0 70 72 70 72
bmerge_r2 72 74 72 74
bmerge_r0 72 74 72 74
bmerge_r2 74 100 74 100
bmerge_r0 86 88 86 88
bmerge_r1 74 87 74 87
bmerge_r0 79 81 79 81
bmerge_r1 74 80 74 80
bmerge_r0 76 78 76 78
bmerge_r1 74 77 74 77
bmerge_r0 74 76 74 76
bmerge_r2 75 77 75 77
bmerge_r0 75 77 75 77
bmerge_r2 77 80 77 80
bmerge_r0 77 79 77 79
bmerge_r2 78 80 78 80
bmerge_r0 78 80 78 80
bmerge_r2 80 87 80 87
bmerge_r0 82 84 82 84
bmerge_r1 80 83 80 83
bmerge_r0 80 82 80 82
bmerge_r2 81 83 81 83
bmerge_r0 81 83 81 83
bmerge_r2 83 87 83 87
bmerge_r0 84 86 84 86
bmerge_r1 83 85 83 85
bmerge_r0 83 85 83 85
bmerge_r2 85 87 85 87
bmerge_r0 85 87 85 87
bmerge_r2 87 100 87 100
bmerge_r0 92 94 92 94
bmerge_r1 87 93 87 93
bmerge_r0 89 91 89 91
bmerge_r1 87 90 87 90
bmerge_r0 87 89 87 89
bmerge_r2 88 90 88 90
bmerge_r0 88 90 88 90
bmerge_r2 90 93 90 93
bmerge_r0 90 92 90 92
bmerge_r2 91 93 91 93
bmerge_r0 91 93 91 93
bmerge_r2 93 100 93 100
bmerge_r0 95 97 95 97
bmerge_r1 93 96 93 96
bmerge_r0 93 95 93 95
bmerge_r2 94 96 94 96
bmerge_r0 94 96 94 96
bmerge_r2 96 100 96 100
bmerge_r0 97 99 97 99
bmerge_r1 96 98 96 98
bmerge_r0 96 98 96 98
bmerge_r2 98 100 98 100
bmerge_r0 98 100 98 100
atime list with 2 measurements for
Regression(N=10 to 100)
@MichaelChirico
Copy link
Contributor

Idle thought as seeing "Registered S3 methods overwritten by..." gave me pause:

WDYT about installing the different package versions into different .libPaths() instead of as packages with different names? I haven't thought it through too carefully, nor do I understand the internals of {atime} particularly well to make this an informed comment anyway, but it seems to me at a glance this could make the separation "cleaner".

@tdhock
Copy link
Owner Author

tdhock commented Sep 24, 2024

in the beginning I considered implementing atime_versions using packages with the same name, installed in different libraries.
To get that to work in the same R session we would have to unload one version and then load another one.
but if I recall correctly, there was some issue with reliably/completely unloading.
even easier to implement would be to just run the different R package versions in different R sessions, but that involves other issues (less controlled comparison between versions, more overhead restarting R).
the current solution is that pkg.edit.fun is supposed to handle all of the details of renaming the package/code so that tests work using different versions installed in the same library, under different package names.
For most test cases this seems to work fine.
the current pkg.edit.fun for data.table does not make any effort to rename S3 methods, so that is why we get the "Registered S3 methods overwritten by..." messages.
Probably this issue could be fixed by making a better pkg.edit.fun, but I have not had time to investigate.

@MichaelChirico
Copy link
Contributor

even easier to implement would be to just run the different R package versions in different R sessions

Yea, I think that would be the way to go. I suspect that might bind eventually anyway if there's some interference caused by the S3 issue, but understand it is kind of a pain.

@tdhock
Copy link
Owner Author

tdhock commented Sep 24, 2024

The "Registered S3 method overwritten..." in particular for [ means that y.dt[x.dt, price+adjustment, roll=TRUE] inside setup is actually calling data.table.e40456fef569d1153fbe203f4fb97dfa4377f205:::`[.data.table`(x.dt, , price.adj := Value) so it is actually not suprising that we get the bmerge messages from that version.
a fix is to specify the version of [ we want to use in setup, for example below I do not get any bmerge messages:

> ares <- atime::atime(
+ expr.list=expr.list,
+ N=10^seq(1,2),
+ setup={
+ setDTthreads(1)
+ x.df <- data.frame(date = as.POSIXct("2000-1-1") + 1:N, price = rnorm(N))
+ x.dt <- data.table(x.df, key = "date")
+ y.df <- data.frame(date = sort(sample(x.df$date, N)), adjustment = runif(N))
+ y.dt <- data.table(y.df, key = "date")
+ Value = data.table:::`[.data.table`(y.dt, x.dt, price+adjustment, roll=TRUE)
+ })

a more complex fix would be to make pkg.edit.fun rename S3 methods.

@tdhock tdhock closed this as completed Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants