count

matrixStats: Benchmark report

count() benchmarks

This report benchmark the performance of count() against alternative methods.

Alternative methods

sum(x == value)

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753532  93.7    2637877 140.9  2637877 140.9
Vcells 18332974 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	0.0054	0.0062	0.0094	0.0065	0.0069	0.2806
sum(x == value)	0.0092	0.0100	0.0103	0.0100	0.0106	0.0189

expr	min	lq	mean	median	uq	max
count	1.000	1.000	1.000	1.000	1.000	1.0000
sum(x == value)	1.714	1.625	1.099	1.529	1.528	0.0672
Figure: Benchmarking of count() and sum(x == value)() on integer+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753454  93.7    2637877 140.9  2637877 140.9
Vcells 18333268 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	0.0177	0.0246	0.0311	0.0293	0.0348	0.1201
sum(x == value)	0.0901	0.1434	0.1403	0.1488	0.1536	0.1909

expr	min	lq	mean	median	uq	max
count	1.000	1.00	1.000	1.000	1.000	1.00
sum(x == value)	5.087	5.82	4.504	5.085	4.409	1.59
Figure: Benchmarking of count() and sum(x == value)() on integer+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753466  93.7    2637877 140.9  2637877 140.9
Vcells 18333276 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	0.1494	0.1684	0.2043	0.1917	0.2331	0.3353
sum(x == value)	0.9112	0.9333	1.2753	1.3679	1.5494	1.9317

expr	min	lq	mean	median	uq	max
count	1.000	1.000	1.000	1.000	1.000	1.000
sum(x == value)	6.101	5.542	6.242	7.136	6.647	5.761
Figure: Benchmarking of count() and sum(x == value)() on integer+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753478  93.7    2637877 140.9  2637877 140.9
Vcells 18333796 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	1.595	1.663	2.07	1.845	2.579	2.843
sum(x == value)	10.362	14.014	18.27	18.179	19.452	54.041

expr	min	lq	mean	median	uq	max
count	1.000	1.000	1.000	1.000	1.000	1.00
sum(x == value)	6.499	8.427	8.828	9.855	7.543	19.01
Figure: Benchmarking of count() and sum(x == value)() on integer+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753490  93.7    2637877 140.9  2637877 140.9
Vcells 18333804 139.9   35610798 271.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on integer+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	15.36	15.59	18.76	17.5	19.58	37.07
sum(x == value)	143.90	171.43	190.82	186.8	204.55	443.83

expr	min	lq	mean	median	uq	max
count	1.000	1	1.00	1.00	1.00	1.00
sum(x == value)	9.366	11	10.17	10.67	10.45	11.97
Figure: Benchmarking of count() and sum(x == value)() on integer+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753502  93.7    2637877 140.9  2637877 140.9
Vcells 23889312 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	sum(x == value)	0.0127	0.0135	0.0147	0.0142	0.0150	0.0262
1	count	0.0135	0.0142	0.0168	0.0150	0.0167	0.0920

	expr	min	lq	mean	median	uq	max
2	sum(x == value)	1.000	1.000	1.000	1.000	1.000	1.000
1	count	1.061	1.057	1.138	1.054	1.115	3.515
Figure: Benchmarking of count() and sum(x == value)() on double+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753514  93.7    2637877 140.9  2637877 140.9
Vcells 23889565 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	0.0393	0.0664	0.0725	0.0781	0.0812	0.1186
sum(x == value)	0.0770	0.1293	0.1284	0.1355	0.1363	0.2560

expr	min	lq	mean	median	uq	max
count	1.000	1.000	1.000	1.000	1.000	1.000
sum(x == value)	1.961	1.948	1.772	1.734	1.678	2.159
Figure: Benchmarking of count() and sum(x == value)() on double+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753526  93.7    2637877 140.9  2637877 140.9
Vcells 23889833 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	0.3565	0.572	0.6148	0.6086	0.6504	1.509
sum(x == value)	0.7676	1.199	1.4942	1.2921	1.3624	14.545

expr	min	lq	mean	median	uq	max
count	1.000	1.000	1.00	1.000	1.000	1.000
sum(x == value)	2.153	2.096	2.43	2.123	2.095	9.641
Figure: Benchmarking of count() and sum(x == value)() on double+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753538  93.7    2637877 140.9  2637877 140.9
Vcells 23889841 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	3.751	4.363	5.70	6.036	6.716	10.89
sum(x == value)	8.060	12.240	13.91	13.683	14.530	72.72

expr	min	lq	mean	median	uq	max
count	1.000	1.000	1.000	1.000	1.000	1.000
sum(x == value)	2.149	2.805	2.441	2.267	2.164	6.676
Figure: Benchmarking of count() and sum(x == value)() on double+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  1753550  93.7    2637877 140.9  2637877 140.9
Vcells 23890161 182.3   42812957 326.7 68120027 519.8
> stats <- microbenchmark(count = count(x, value), `sum(x == value)` = sum(x == value), unit = "ms")

Table: Benchmarking of count() and sum(x == value)() on double+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
count	37.61	41.24	46.89	46.26	51.05	90.78
sum(x == value)	90.16	106.57	116.83	114.62	128.46	152.56

expr	min	lq	mean	median	uq	max
count	1.000	1.000	1.000	1.000	1.000	1.000
sum(x == value)	2.397	2.584	2.491	2.478	2.516	1.681
Figure: Benchmarking of count() and sum(x == value)() on double+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2015-02-27 r67909)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markdown_0.7.7          microbenchmark_1.4-2    matrixStats_0.14.0-9000
[4] ggplot2_1.0.0           knitr_1.9.3             R.devices_2.13.0       
[7] R.utils_2.0.0           R.oo_1.19.0             R.methodsS3_1.7.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.4           GenomeInfoDb_1.3.13   formatR_1.0.3        
 [4] plyr_1.8.1            base64enc_0.1-3       tools_3.2.0          
 [7] digest_0.6.8          RSQLite_1.0.0         annotate_1.45.2      
[10] evaluate_0.5.7        gtable_0.1.2          R.cache_0.11.1-9000  
[13] lattice_0.20-30       DBI_0.3.1             parallel_3.2.0       
[16] mvtnorm_1.0-2         proto_0.3-10          R.rsp_0.20.0         
[19] genefilter_1.49.2     stringr_0.6.2         IRanges_2.1.41       
[22] S4Vectors_0.5.21      stats4_3.2.0          grid_3.2.0           
[25] Biobase_2.27.2        AnnotationDbi_1.29.17 XML_3.98-1.1         
[28] survival_2.38-1       multcomp_1.3-9        TH.data_1.0-6        
[31] reshape2_1.4.1        scales_0.2.4          MASS_7.3-39          
[34] splines_3.2.0         BiocGenerics_0.13.6   xtable_1.8-0         
[37] mime_0.2.1            colorspace_1.2-4      labeling_0.3         
[40] sandwich_2.3-2        munsell_0.4.2         Cairo_1.5-6          
[43] zoo_1.7-12

Total processing time was 59.73 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('count')

Copyright Henrik Bengtsson. Last updated on 2015-03-02 17:27:04 (-0800 UTC). Powered by RSP.

[Benchmark reports](Benchmark reports)

count

count() benchmarks

Alternative methods

Data type "integer"

Data

Results

n=1000 vector

n=10000 vector

n=100000 vector

n=1000000 vector

n=10000000 vector

Data type "double"

Data

Results

n=1000 vector

n=10000 vector

n=100000 vector

n=1000000 vector

n=10000000 vector

Appendix

Session information

Reproducibility

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally