We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arrange()
arrange() can utilize setorder() to be faster.
setorder()
This would have even more benefits in a longer pipe chain with an implicit copy where we wouldn't need to call copy() outright.
copy()
Note: In the case of a transmute expression inside arrange() we would need to still use order(). Ex: arrange(df, x - 1)
order()
arrange(df, x - 1)
library(stringi) library(data.table) rand_strings <- stri_rand_strings(100, 4) bench::press( data_size = c(10000, 1000000, 10000000), { df <- data.table(a = sample(rand_strings, data_size, TRUE), b = round(runif(data_size), 4)) bench::mark( order = df[order(a, b)], setorder = setorder(copy(df), a, b), check = FALSE, iterations = 30 ) } ) #> Running with: #> data_size #> 1 10000 #> 2 1000000 #> 3 10000000 #> # A tibble: 6 × 7 #> expression data_size min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 order 10000 1.06ms 1.38ms 703. 2.01MB 0 #> 2 setorder 10000 1.01ms 1.23ms 800. 371.95KB 0 #> 3 order 1000000 56.75ms 66.49ms 15.0 19.09MB 9.97 #> 4 setorder 1000000 45.15ms 51.5ms 19.0 27.67MB 34.9 #> 5 order 10000000 676.79ms 690.23ms 1.43 190.75MB 4.91 #> 6 setorder 10000000 560.98ms 648.71ms 1.55 276.58MB 3.88 # With implicit copy (from filter/slice/select/summarize) bench::press( data_size = c(10000, 1000000, 10000000), { df <- data.table(a = sample(rand_strings, data_size, TRUE), b = round(runif(data_size), 4)) bench::mark( order = df[, .(a, b)][order(a, b)], setorder = setorder(df[, .(a, b)], a, b), check = FALSE, iterations = 30 ) } ) #> Running with: #> data_size #> 1 10000 #> 2 1000000 #> 3 10000000 #> Warning: Some expressions had a GC in every iteration; so filtering is disabled. #> # A tibble: 6 × 7 #> expression data_size min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 order 10000 1.65ms 1.97ms 488. 756.3KB 0 #> 2 setorder 10000 1.35ms 1.58ms 622. 488.4KB 21.4 #> 3 order 1000000 73.8ms 80.88ms 11.7 49.7MB 1.30 #> 4 setorder 1000000 59.8ms 67.33ms 14.7 43MB 2.26 #> 5 order 10000000 766.43ms 914.88ms 1.09 496MB 1.71 #> 6 setorder 10000000 628.64ms 733.18ms 1.35 429.2MB 1.67
Created on 2022-06-22 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered:
Might be a good idea to wait on fixing #210 though
Edit: Needs #366
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
arrange()
can utilizesetorder()
to be faster.This would have even more benefits in a longer pipe chain with an implicit copy where we wouldn't need to call
copy()
outright.Note: In the case of a transmute expression inside
arrange()
we would need to still useorder()
.Ex:
arrange(df, x - 1)
Created on 2022-06-22 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: