Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#performance regression 4646 #3

Open
DorisAmoakohene opened this issue Oct 6, 2023 · 6 comments
Open

#performance regression 4646 #3

DorisAmoakohene opened this issue Oct 6, 2023 · 6 comments

Comments

@DorisAmoakohene
Copy link
Owner

I also have been trying this issue to

but I keep on running into errors.

This issue is checking the system time of some data.table version (4 seconds to process in V 1.12.8, and more than 24+ hours when using V 1.13.0. )
which is issues (Rdatatable/data.table#4646)

the error says "Error in value[3L] :
Error in revparse_single(object, branch): Error in 'git2r_revparse_single': Requested object could not be found
when trying to checkout 8e78a309f75e724ff6eda219b8f2ad172a01f8e4"
one of the ids could not be found

@tdhock

atime.list <- atime::atime_versions(
pkg.path="C:/Users/Doris Afriyie/data.table",
pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
      pkg_find_replace <- function(glob, FIND, REPLACE){
        atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
      }
      Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
      Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
      new.Package_ <- paste0(Package_, "_", sha)
      pkg_find_replace(
        "DESCRIPTION", 
        paste0("Package:\\s+", old.Package),
        paste("Package:", new.Package))
      pkg_find_replace(
        file.path("src","Makevars.*in"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        sprintf('packageVersion\\("%s"\\)', old.Package),
        sprintf('packageVersion\\("%s"\\)', new.Package))
      pkg_find_replace(
        file.path("src", "init.c"),
        paste0("R_init_", Package_regex),
        paste0("R_init_", gsub("[.]", "_", new.Package_)))
      pkg_find_replace(
        "NAMESPACE",
        sprintf('useDynLib\\("?%s"?', Package_regex),
        paste0('useDynLib(', new.Package_))
    },
  N=10^seq(3,8),
  setup={ 
    n <- 500L
    set.seed(123L)
    n_nested = 40L
    dt = data.table(id = seq_len(n),
                    value = replicate(n, data.table(val1 = sample(n_nested)), simplify = FALSE)))
  },#https://github.com/Rdatatable/data.table/issues/4646
  expr=data.table:::`[.data.table`(dt[seq_len(.N), value[[1L]], by = id]
      ,dt[, value[[1L]], by = id]),  "Before"="8e78a309f75e724ff6eda219b8f2ad172a01f8e4",#https://github.com/Rdatatable/data.table/pull/4655/commits/8e78a309f75e724ff6eda219b8f2ad172a01f8e4
  "Regression"="06c43b102f4b97ec91cb14a0c1631f771a9a0414", #https://github.com/Rdatatable/data.table/tree/06c43b102f4b97ec91cb14a0c1631f771a9a0414
  "Fixed"="ec041f78691f9f9495c0f25b78109cd3f5295f4f") #fixed:#https://github.com/Rdatatable/data.table/tree/ec041f78691f9f9495c0f25b78109cd3f5295f4f. for fixed..
@tdhock
Copy link

tdhock commented Oct 6, 2023

I believe it should work now because I restored the branch, Rdatatable/data.table#4655 can you please try again?

@tdhock
Copy link

tdhock commented Oct 6, 2023

can you please create a more informative title for this issue? (current title is #)

@DorisAmoakohene
Copy link
Owner Author

@tdhock just tried again but I'm still getting the same error message "Error in value[3L] :
Error in revparse_single(object, branch): Error in 'git2r_revparse_single': Requested object could not be found

when trying to checkout 8e78a309f75e724ff6eda219b8f2ad172a01f8e4"

@tdhock
Copy link

tdhock commented Oct 9, 2023

you need to run git pull in C:/Users/Doris Afriyie/data.table before it will work

@tdhock
Copy link

tdhock commented Oct 9, 2023

or git fetch

@DorisAmoakohene DorisAmoakohene changed the title # #performance regression 4646 Oct 9, 2023
@tdhock
Copy link

tdhock commented Oct 10, 2023

this was the original code in the issue

bench::mark(
    dt[seq_len(.N), value[[1L]], by = id]
    , dt[, value[[1L]], by = id]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression                                 min   median `itr/sec` mem_alloc
#>   <bch:expr>                            <bch:tm> <bch:tm>     <dbl> <bch:byt>
#> 1 dt[seq_len(.N), value[[1L]], by = id]   2.56ms   3.05ms    287.      2.24MB
#> 2 dt[, value[[1L]], by = id]            219.41ms 250.98ms      3.98   51.91MB
#> # ... with 1 more variable: `gc/sec` <dbl>

the issue in the code above is that dt[, value[[1L]], by = id] is much slower than dt[seq_len(.N), value[[1L]], by = id]

below is your translation to atime_versions code

  expr=data.table:::`[.data.table`(dt[seq_len(.N), value[[1L]], by = id]
      ,dt[, value[[1L]], by = id]), 

the issue in the code above is that both expressions, dt[, value[[1L]], by = id] and dt[seq_len(.N), value[[1L]], by = id], are being used inside expr.
but expr should contain just the code which was problematic for some version, and which could be fixed in a different version.
so the solution should be to do just dt[, value[[1L]], by = id]) and not dt[seq_len(.N), value[[1L]], by = id] in expr.
but the trick when using atime_versions, is that you need to translate operators like [ to function call syntax like [.data.table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants