Skip to content

colRowTabulates

hb edited this page Mar 3, 2015 · 2 revisions

matrixStats: Benchmark report


colTabulates() and rowTabulates() benchmarks

This report benchmark the performance of colTabulates() and rowTabulates() against alternative methods.

Alternative methods

  • ???

Data

> rmatrix <- function(nrow, ncol, mode = c("logical", "double", "integer", "index"), range = c(-100, 
+     +100), naProb = 0) {
+     mode <- match.arg(mode)
+     n <- nrow * ncol
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else if (mode == "index") {
+         X <- seq_len(n)
+         mode <- "integer"
+     }     else {
+         X <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(X) <- mode
+     if (naProb > 0) 
+         X[sample(n, size = naProb * n)] <- NA
+     dim(X) <- c(nrow, ncol)
+     X
+ }
> rmatrices <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rmatrix(nrow = scale * 1, ncol = scale * 1, ...)
+     data[[2]] <- rmatrix(nrow = scale * 10, ncol = scale * 10, ...)
+     data[[3]] <- rmatrix(nrow = scale * 100, ncol = scale * 1, ...)
+     data[[4]] <- t(data[[3]])
+     data[[5]] <- rmatrix(nrow = scale * 10, ncol = scale * 100, ...)
+     data[[6]] <- t(data[[5]])
+     names(data) <- sapply(data, FUN = function(x) paste(dim(x), collapse = "x"))
+     data
+ }
> data <- rmatrices(mode = "integer", range = c(-10, 10))

Results

10x10 matrix

> X <- data[["10x10"]]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   807441 43.2    1442291  77.1  1442291  77.1
Vcells 12282647 93.8   35610798 271.7 68120027 519.8
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   805947 43.1    1442291  77.1  1442291  77.1
Vcells 12277772 93.7   35610798 271.7 68120027 519.8
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 10x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
colTabulates 0.2637 0.3932 0.406 0.4009 0.4211 0.7684
expr min lq mean median uq max
colTabulates 1 1 1 1 1 1
Table: Benchmarking of rowTabulates() on 10x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
expr min lq mean median uq max
rowTabulates 0.3811 0.3905 0.4116 0.3953 0.4073 0.9855
expr min lq mean median uq max
rowTabulates 1 1 1 1 1 1
Figure: Benchmarking of colTabulates() on 10x10 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 10x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 rowTabulates 381.1 390.5 411.6 395.3 407.3 985.5
1 colTabulates 263.7 393.2 406.0 400.9 421.1 768.4
expr min lq mean median uq max
2 rowTabulates 1.0000 1.000 1.0000 1.000 1.000 1.0000
1 colTabulates 0.6919 1.007 0.9864 1.014 1.034 0.7797
Figure: Benchmarking of colTabulates() and rowTabulates() on 10x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x100 matrix

> X <- data[["100x100"]]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806076 43.1    1442291  77.1  1442291  77.1
Vcells 12278752 93.7   35610798 271.7 68120027 519.8
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806070 43.1    1442291  77.1  1442291  77.1
Vcells 12283795 93.8   35610798 271.7 68120027 519.8
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 100x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
colTabulates 0.8781 0.8927 1.086 1.18 1.213 1.696
expr min lq mean median uq max
colTabulates 1 1 1 1 1 1
Table: Benchmarking of rowTabulates() on 100x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
expr min lq mean median uq max
rowTabulates 0.9139 0.9264 1.203 1.165 1.379 1.806
expr min lq mean median uq max
rowTabulates 1 1 1 1 1 1
Figure: Benchmarking of colTabulates() on 100x100 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 100x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 rowTabulates 913.9 926.4 1203 1164 1379 1806
1 colTabulates 878.1 892.7 1086 1180 1213 1696
expr min lq mean median uq max
2 rowTabulates 1.0000 1.0000 1.0000 1.000 1.0000 1.0000
1 colTabulates 0.9608 0.9636 0.9022 1.013 0.8795 0.9393
Figure: Benchmarking of colTabulates() and rowTabulates() on 100x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x10 matrix

> X <- data[["1000x10"]]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806112 43.1    1442291  77.1  1442291  77.1
Vcells 12278776 93.7   35610798 271.7 68120027 519.8
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806106 43.1    1442291  77.1  1442291  77.1
Vcells 12283819 93.8   35610798 271.7 68120027 519.8
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 1000x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
colTabulates 0.8057 0.8286 1.042 1.076 1.096 1.431
expr min lq mean median uq max
colTabulates 1 1 1 1 1 1
Table: Benchmarking of rowTabulates() on 1000x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
expr min lq mean median uq max
rowTabulates 0.9847 1.198 1.258 1.224 1.29 1.625
expr min lq mean median uq max
rowTabulates 1 1 1 1 1 1
Figure: Benchmarking of colTabulates() on 1000x10 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 1000x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
colTabulates 805.7 828.6 1042 1076 1096 1431
rowTabulates 984.7 1198.0 1258 1224 1290 1625
expr min lq mean median uq max
colTabulates 1.000 1.000 1.000 1.000 1.000 1.000
rowTabulates 1.222 1.446 1.206 1.138 1.177 1.135
Figure: Benchmarking of colTabulates() and rowTabulates() on 1000x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

10x1000 matrix

> X <- data[["10x1000"]]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806148 43.1    1442291  77.1  1442291  77.1
Vcells 12279006 93.7   35610798 271.7 68120027 519.8
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806142 43.1    1442291  77.1  1442291  77.1
Vcells 12284049 93.8   35610798 271.7 68120027 519.8
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 10x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
colTabulates 1.524 2.138 2.162 2.276 2.322 2.837
expr min lq mean median uq max
colTabulates 1 1 1 1 1 1
Table: Benchmarking of rowTabulates() on 10x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
expr min lq mean median uq max
rowTabulates 1.429 1.454 1.704 1.48 2.059 2.362
expr min lq mean median uq max
rowTabulates 1 1 1 1 1 1
Figure: Benchmarking of colTabulates() on 10x1000 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 10x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 rowTabulates 1.429 1.454 1.704 1.480 2.059 2.362
1 colTabulates 1.524 2.138 2.162 2.276 2.322 2.837
expr min lq mean median uq max
2 rowTabulates 1.000 1.000 1.000 1.000 1.000 1.000
1 colTabulates 1.066 1.471 1.269 1.538 1.128 1.201
Figure: Benchmarking of colTabulates() and rowTabulates() on 10x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x1000 matrix

> X <- data[["100x1000"]]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806184 43.1    1442291  77.1  1442291  77.1
Vcells 12279653 93.7   35610798 271.7 68120027 519.8
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806178 43.1    1442291  77.1  1442291  77.1
Vcells 12329696 94.1   35610798 271.7 68120027 519.8
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 100x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
colTabulates 6.743 7.226 8.87 8.771 10.19 20.71
expr min lq mean median uq max
colTabulates 1 1 1 1 1 1
Table: Benchmarking of rowTabulates() on 100x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
expr min lq mean median uq max
rowTabulates 6.963 7.099 9.139 8.335 9.739 35.82
expr min lq mean median uq max
rowTabulates 1 1 1 1 1 1
Figure: Benchmarking of colTabulates() on 100x1000 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 100x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 rowTabulates 6.963 7.099 9.139 8.335 9.739 35.82
1 colTabulates 6.743 7.226 8.870 8.771 10.186 20.71
expr min lq mean median uq max
2 rowTabulates 1.0000 1.000 1.0000 1.000 1.000 1.000
1 colTabulates 0.9684 1.018 0.9706 1.052 1.046 0.578
Figure: Benchmarking of colTabulates() and rowTabulates() on 100x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x100 matrix

> X <- data[["1000x100"]]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806220 43.1    1442291  77.1  1442291  77.1
Vcells 12280054 93.7   35610798 271.7 68120027 519.8
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   806214 43.1    1442291  77.1  1442291  77.1
Vcells 12330097 94.1   35610798 271.7 68120027 519.8
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 1000x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
colTabulates 6.074 7.399 8.37 8.295 9.355 21.92
expr min lq mean median uq max
colTabulates 1 1 1 1 1 1
Table: Benchmarking of rowTabulates() on 1000x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
expr min lq mean median uq max
rowTabulates 6.628 7.006 8.59 7.944 10.05 23.31
expr min lq mean median uq max
rowTabulates 1 1 1 1 1 1
Figure: Benchmarking of colTabulates() on 1000x100 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 1000x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 rowTabulates 6.628 7.006 8.59 7.944 10.049 23.31
1 colTabulates 6.074 7.399 8.37 8.295 9.355 21.92
expr min lq mean median uq max
2 rowTabulates 1.0000 1.000 1.0000 1.000 1.0000 1.0000
1 colTabulates 0.9164 1.056 0.9745 1.044 0.9309 0.9403
Figure: Benchmarking of colTabulates() and rowTabulates() on 1000x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2015-02-27 r67909)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markdown_0.7.7          microbenchmark_1.4-2    matrixStats_0.14.0-9000
[4] ggplot2_1.0.0           knitr_1.9.3             R.devices_2.13.0       
[7] R.utils_2.0.0           R.oo_1.19.0             R.methodsS3_1.7.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.4         BiocGenerics_0.13.6 splines_3.2.0      
 [4] MASS_7.3-39         munsell_0.4.2       lattice_0.20-30    
 [7] colorspace_1.2-4    R.cache_0.11.1-9000 multcomp_1.3-9     
[10] stringr_0.6.2       plyr_1.8.1          tools_3.2.0        
[13] parallel_3.2.0      grid_3.2.0          Biobase_2.27.2     
[16] gtable_0.1.2        TH.data_1.0-6       survival_2.38-1    
[19] digest_0.6.8        R.rsp_0.20.0        reshape2_1.4.1     
[22] formatR_1.0.3       base64enc_0.1-3     mime_0.2.1         
[25] evaluate_0.5.7      labeling_0.3        sandwich_2.3-2     
[28] scales_0.2.4        mvtnorm_1.0-2       zoo_1.7-12         
[31] Cairo_1.5-6         proto_0.3-10       

Total processing time was 20.02 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('colTabulates')

Copyright Henrik Bengtsson. Last updated on 2015-03-02 17:22:41 (-0800 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAMAAABEpIrGAAAADFBMVEX9/v0AAP/9/v3//wBEQjoBAAAABHRSTlP//wD//gy7CwAAAGJJREFUOI3N0rESwCAIA9Ag///PXdoiBk0HhmbNO49DMETQCexNCSyFgdlGoO5DYOr9ThLgPosA7osIQP0sHuDOog8UI/ALa988wzdwXJRctf4s+d36YPTJ6aMd8ux3+QO4ABTtB85yDAh9AAAAAElFTkSuQmCC" document.getElementsByTagName('head')[0].appendChild(link); </script>

[Benchmark reports](Benchmark reports)

Clone this wiki locally