anyMissing

matrixStats: Benchmark report

anyMissing() benchmarks

This report benchmark the performance of anyMissing() against alternative methods.

Alternative methods

anyNA()
any() + is.na()

as below

> any_is.na <- function(x) {
+     any(is.na(x))
+ }

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643241  34.4    1168576  62.5  1168576  62.5
Vcells 17552012 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	0.0004	0.0004	0.0007	0.0008	0.0008	0.0042
1	anyMissing	0.0015	0.0015	0.0024	0.0019	0.0023	0.0431
3	any_is.na	0.0035	0.0039	0.0045	0.0042	0.0050	0.0192

	expr	min	lq	mean	median	uq	max
2	anyNA	1	1.000	1.000	1.000	1.000	1.000
1	anyMissing	4	3.992	3.663	2.501	2.996	10.181
3	any_is.na	9	9.975	6.869	5.500	6.492	4.545
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643258  34.4    1168576  62.5  1168576  62.5
Vcells 17552377 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	0.0058	0.0062	0.0070	0.0062	0.0065	0.0277
1	anyMissing	0.0069	0.0073	0.0096	0.0077	0.0085	0.0493
3	any_is.na	0.0250	0.0262	0.0309	0.0277	0.0304	0.0647

	expr	min	lq	mean	median	uq	max
2	anyNA	1.000	1.000	1.000	1.000	1.000	1.000
1	anyMissing	1.200	1.188	1.374	1.250	1.294	1.778
3	any_is.na	4.333	4.250	4.447	4.499	4.646	2.333
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643270  34.4    1168576  62.5  1168576  62.5
Vcells 17552385 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	0.0654	0.0670	0.0882	0.0755	0.0907	0.7499
1	anyMissing	0.0678	0.0693	0.0860	0.0785	0.0962	0.1748
3	any_is.na	0.2914	0.3509	0.3996	0.3699	0.4294	1.0725

	expr	min	lq	mean	median	uq	max
2	anyNA	1.000	1.000	1.0000	1.000	1.000	1.0000
1	anyMissing	1.035	1.034	0.9753	1.041	1.062	0.2331
3	any_is.na	4.453	5.238	4.5333	4.903	4.737	1.4302
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643282  34.4    1168576  62.5  1168576  62.5
Vcells 17552905 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	0.6317	0.7842	0.8685	0.8354	0.8602	1.482
1	anyMissing	0.6552	0.8215	0.9133	0.8631	0.9505	1.557
3	any_is.na	3.4834	3.9034	4.4829	4.0039	4.2865	32.934

	expr	min	lq	mean	median	uq	max
2	anyNA	1.000	1.000	1.000	1.000	1.000	1.00
1	anyMissing	1.037	1.048	1.052	1.033	1.105	1.05
3	any_is.na	5.514	4.978	5.162	4.793	4.983	22.22
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643295  34.4    1168576  62.5  1168576  62.5
Vcells 17552921 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
anyMissing	7.758	7.853	8.370	8.178	8.500	11.84
anyNA	7.674	7.765	8.396	8.261	8.468	11.46
any_is.na	37.277	39.845	46.654	41.569	45.011	91.56

expr	min	lq	mean	median	uq	max
anyMissing	1.0000	1.0000	1.000	1.000	1.0000	1.0000
anyNA	0.9892	0.9888	1.003	1.010	0.9963	0.9681
any_is.na	4.8049	5.0738	5.574	5.083	5.2956	7.7330
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643307  34.4    1168576  62.5  1168576  62.5
Vcells 23108750 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	0.0019	0.0019	0.0021	0.0019	0.0023	0.0042
1	anyMissing	0.0027	0.0031	0.0034	0.0031	0.0035	0.0212
3	any_is.na	0.0046	0.0050	0.0054	0.0050	0.0054	0.0131

	expr	min	lq	mean	median	uq	max
2	anyNA	1.0	1.000	1.000	1.000	1.000	1.000
1	anyMissing	1.4	1.599	1.633	1.600	1.500	4.998
3	any_is.na	2.4	2.599	2.607	2.599	2.334	3.090
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643319  34.4    1168576  62.5  1168576  62.5
Vcells 23108758 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	0.0208	0.0208	0.0213	0.0212	0.0212	0.0358
1	anyMissing	0.0216	0.0223	0.0229	0.0227	0.0227	0.0373
3	any_is.na	0.0397	0.0404	0.0419	0.0408	0.0412	0.1274

	expr	min	lq	mean	median	uq	max
2	anyNA	1.000	1.000	1.000	1.000	1.000	1.000
1	anyMissing	1.037	1.074	1.073	1.073	1.073	1.043
3	any_is.na	1.907	1.944	1.965	1.927	1.945	3.559
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643331  34.4    1168576  62.5  1168576  62.5
Vcells 23109046 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	0.2306	0.2331	0.2663	0.2396	0.2702	0.4677
1	anyMissing	0.2325	0.2383	0.2641	0.2458	0.2633	0.4350
3	any_is.na	0.4350	0.5039	0.5349	0.5166	0.5380	0.8600

	expr	min	lq	mean	median	uq	max
2	anyNA	1.000	1.000	1.0000	1.000	1.0000	1.000
1	anyMissing	1.008	1.022	0.9918	1.026	0.9744	0.930
3	any_is.na	1.887	2.162	2.0087	2.156	1.9907	1.839
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643343  34.4    1168576  62.5  1168576  62.5
Vcells 23109385 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr	min	lq	mean	median	uq	max
anyMissing	2.411	2.608	3.037	2.679	3.503	5.077
anyNA	2.394	2.584	2.977	2.689	3.456	4.998
any_is.na	4.376	5.631	6.625	5.838	7.443	32.233

expr	min	lq	mean	median	uq	max
anyMissing	1.0000	1.0000	1.0000	1.000	1.0000	1.0000
anyNA	0.9927	0.9909	0.9802	1.004	0.9864	0.9844
any_is.na	1.8147	2.1595	2.1815	2.179	2.1244	6.3485
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643355  34.4    1168576  62.5  1168576  62.5
Vcells 23109393 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	anyNA	24.07	26.08	30.44	29.54	33.78	42.76
1	anyMissing	23.88	26.46	30.34	30.00	32.43	41.13
3	any_is.na	55.41	60.57	70.94	64.54	72.30	114.00

	expr	min	lq	mean	median	uq	max
2	anyNA	1.000	1.000	1.0000	1.000	1.0000	1.0000
1	anyMissing	0.992	1.014	0.9969	1.016	0.9599	0.9621
3	any_is.na	2.302	2.322	2.3307	2.185	2.1404	2.6662
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2015-02-27 r67909)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markdown_0.7.7          microbenchmark_1.4-2    matrixStats_0.14.0-9000
[4] ggplot2_1.0.0           knitr_1.9.3             R.devices_2.13.0       
[7] R.utils_2.0.0           R.oo_1.19.0             R.methodsS3_1.7.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.4         splines_3.2.0       MASS_7.3-39        
 [4] munsell_0.4.2       lattice_0.20-30     colorspace_1.2-4   
 [7] R.cache_0.11.1-9000 multcomp_1.3-9      stringr_0.6.2      
[10] plyr_1.8.1          tools_3.2.0         grid_3.2.0         
[13] gtable_0.1.2        TH.data_1.0-6       survival_2.38-1    
[16] digest_0.6.8        R.rsp_0.20.0        reshape2_1.4.1     
[19] formatR_1.0.3       base64enc_0.1-3     mime_0.2.1         
[22] evaluate_0.5.7      labeling_0.3        sandwich_2.3-2     
[25] scales_0.2.4        mvtnorm_1.0-2       zoo_1.7-12         
[28] Cairo_1.5-6         proto_0.3-10

Total processing time was 32.14 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('anyMissing')

Copyright Henrik Bengtsson. Last updated on 2015-03-02 16:54:45 (-0800 UTC). Powered by RSP.

[Benchmark reports](Benchmark reports)

anyMissing

anyMissing() benchmarks

Alternative methods

Data type "integer"

Data

Results

n=1000 vector

n=10000 vector

n=100000 vector

n=1000000 vector

n=10000000 vector

Data type "double"

Data

Results

n=1000 vector

n=10000 vector

n=100000 vector

n=1000000 vector

n=10000000 vector

Appendix

Session information

Reproducibility

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally