Skip to content
hb edited this page Mar 3, 2015 · 2 revisions

matrixStats: Benchmark report


count() benchmarks

This report benchmark the performance of count() against alternative methods.

Alternative methods

  • sum(x == value)

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753532  93.7    2637877 140.9  2637877 140.9
Vcells 18332974 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 0.0054 0.0062 0.0094 0.0065 0.0069 0.2806
sum(x == value) 0.0092 0.0100 0.0103 0.0100 0.0106 0.0189
expr min lq mean median uq max
count 1.000 1.000 1.000 1.000 1.000 1.0000
sum(x == value) 1.714 1.625 1.099 1.529 1.528 0.0672
Figure: Benchmarking of count() and sum(x == value)() on integer+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753454  93.7    2637877 140.9  2637877 140.9
Vcells 18333268 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 0.0177 0.0246 0.0311 0.0293 0.0348 0.1201
sum(x == value) 0.0901 0.1434 0.1403 0.1488 0.1536 0.1909
expr min lq mean median uq max
count 1.000 1.00 1.000 1.000 1.000 1.00
sum(x == value) 5.087 5.82 4.504 5.085 4.409 1.59
Figure: Benchmarking of count() and sum(x == value)() on integer+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753466  93.7    2637877 140.9  2637877 140.9
Vcells 18333276 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 0.1494 0.1684 0.2043 0.1917 0.2331 0.3353
sum(x == value) 0.9112 0.9333 1.2753 1.3679 1.5494 1.9317
expr min lq mean median uq max
count 1.000 1.000 1.000 1.000 1.000 1.000
sum(x == value) 6.101 5.542 6.242 7.136 6.647 5.761
Figure: Benchmarking of count() and sum(x == value)() on integer+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753478  93.7    2637877 140.9  2637877 140.9
Vcells 18333796 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 1.595 1.663 2.07 1.845 2.579 2.843
sum(x == value) 10.362 14.014 18.27 18.179 19.452 54.041
expr min lq mean median uq max
count 1.000 1.000 1.000 1.000 1.000 1.00
sum(x == value) 6.499 8.427 8.828 9.855 7.543 19.01
Figure: Benchmarking of count() and sum(x == value)() on integer+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753490  93.7    2637877 140.9  2637877 140.9
Vcells 18333804 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 15.36 15.59 18.76 17.5 19.58 37.07
sum(x == value) 143.90 171.43 190.82 186.8 204.55 443.83
expr min lq mean median uq max
count 1.000 1 1.00 1.00 1.00 1.00
sum(x == value) 9.366 11 10.17 10.67 10.45 11.97
Figure: Benchmarking of count() and sum(x == value)() on integer+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753502  93.7    2637877 140.9  2637877 140.9
Vcells 23889312 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 sum(x == value) 0.0127 0.0135 0.0147 0.0142 0.0150 0.0262
1 count 0.0135 0.0142 0.0168 0.0150 0.0167 0.0920
expr min lq mean median uq max
2 sum(x == value) 1.000 1.000 1.000 1.000 1.000 1.000
1 count 1.061 1.057 1.138 1.054 1.115 3.515
Figure: Benchmarking of count() and sum(x == value)() on double+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753514  93.7    2637877 140.9  2637877 140.9
Vcells 23889565 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 0.0393 0.0664 0.0725 0.0781 0.0812 0.1186
sum(x == value) 0.0770 0.1293 0.1284 0.1355 0.1363 0.2560
expr min lq mean median uq max
count 1.000 1.000 1.000 1.000 1.000 1.000
sum(x == value) 1.961 1.948 1.772 1.734 1.678 2.159
Figure: Benchmarking of count() and sum(x == value)() on double+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753526  93.7    2637877 140.9  2637877 140.9
Vcells 23889833 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 0.3565 0.572 0.6148 0.6086 0.6504 1.509
sum(x == value) 0.7676 1.199 1.4942 1.2921 1.3624 14.545
expr min lq mean median uq max
count 1.000 1.000 1.00 1.000 1.000 1.000
sum(x == value) 2.153 2.096 2.43 2.123 2.095 9.641
Figure: Benchmarking of count() and sum(x == value)() on double+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753538  93.7    2637877 140.9  2637877 140.9
Vcells 23889841 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 3.751 4.363 5.70 6.036 6.716 10.89
sum(x == value) 8.060 12.240 13.91 13.683 14.530 72.72
expr min lq mean median uq max
count 1.000 1.000 1.000 1.000 1.000 1.000
sum(x == value) 2.149 2.805 2.441 2.267 2.164 6.676
Figure: Benchmarking of count() and sum(x == value)() on double+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753550  93.7    2637877 140.9  2637877 140.9
Vcells 23890161 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
count 37.61 41.24 46.89 46.26 51.05 90.78
sum(x == value) 90.16 106.57 116.83 114.62 128.46 152.56
expr min lq mean median uq max
count 1.000 1.000 1.000 1.000 1.000 1.000
sum(x == value) 2.397 2.584 2.491 2.478 2.516 1.681
Figure: Benchmarking of count() and sum(x == value)() on double+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2015-02-27 r67909)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markdown_0.7.7          microbenchmark_1.4-2    matrixStats_0.14.0-9000
[4] ggplot2_1.0.0           knitr_1.9.3             R.devices_2.13.0       
[7] R.utils_2.0.0           R.oo_1.19.0             R.methodsS3_1.7.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.4           GenomeInfoDb_1.3.13   formatR_1.0.3        
 [4] plyr_1.8.1            base64enc_0.1-3       tools_3.2.0          
 [7] digest_0.6.8          RSQLite_1.0.0         annotate_1.45.2      
[10] evaluate_0.5.7        gtable_0.1.2          R.cache_0.11.1-9000  
[13] lattice_0.20-30       DBI_0.3.1             parallel_3.2.0       
[16] mvtnorm_1.0-2         proto_0.3-10          R.rsp_0.20.0         
[19] genefilter_1.49.2     stringr_0.6.2         IRanges_2.1.41       
[22] S4Vectors_0.5.21      stats4_3.2.0          grid_3.2.0           
[25] Biobase_2.27.2        AnnotationDbi_1.29.17 XML_3.98-1.1         
[28] survival_2.38-1       multcomp_1.3-9        TH.data_1.0-6        
[31] reshape2_1.4.1        scales_0.2.4          MASS_7.3-39          
[34] splines_3.2.0         BiocGenerics_0.13.6   xtable_1.8-0         
[37] mime_0.2.1            colorspace_1.2-4      labeling_0.3         
[40] sandwich_2.3-2        munsell_0.4.2         Cairo_1.5-6          
[43] zoo_1.7-12           

Total processing time was 59.73 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('count')

Copyright Henrik Bengtsson. Last updated on 2015-03-02 17:27:04 (-0800 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>

[Benchmark reports](Benchmark reports)

Clone this wiki locally