Skip to content

anyMissing

hb edited this page Mar 3, 2015 · 2 revisions

matrixStats: Benchmark report


anyMissing() benchmarks

This report benchmark the performance of anyMissing() against alternative methods.

Alternative methods

  • anyNA()
  • any() + is.na()

as below

> any_is.na <- function(x) {
+     any(is.na(x))
+ }

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643241  34.4    1168576  62.5  1168576  62.5
Vcells 17552012 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.0004 0.0004 0.0007 0.0008 0.0008 0.0042
1 anyMissing 0.0015 0.0015 0.0024 0.0019 0.0023 0.0431
3 any_is.na 0.0035 0.0039 0.0045 0.0042 0.0050 0.0192
expr min lq mean median uq max
2 anyNA 1 1.000 1.000 1.000 1.000 1.000
1 anyMissing 4 3.992 3.663 2.501 2.996 10.181
3 any_is.na 9 9.975 6.869 5.500 6.492 4.545
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643258  34.4    1168576  62.5  1168576  62.5
Vcells 17552377 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.0058 0.0062 0.0070 0.0062 0.0065 0.0277
1 anyMissing 0.0069 0.0073 0.0096 0.0077 0.0085 0.0493
3 any_is.na 0.0250 0.0262 0.0309 0.0277 0.0304 0.0647
expr min lq mean median uq max
2 anyNA 1.000 1.000 1.000 1.000 1.000 1.000
1 anyMissing 1.200 1.188 1.374 1.250 1.294 1.778
3 any_is.na 4.333 4.250 4.447 4.499 4.646 2.333
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643270  34.4    1168576  62.5  1168576  62.5
Vcells 17552385 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.0654 0.0670 0.0882 0.0755 0.0907 0.7499
1 anyMissing 0.0678 0.0693 0.0860 0.0785 0.0962 0.1748
3 any_is.na 0.2914 0.3509 0.3996 0.3699 0.4294 1.0725
expr min lq mean median uq max
2 anyNA 1.000 1.000 1.0000 1.000 1.000 1.0000
1 anyMissing 1.035 1.034 0.9753 1.041 1.062 0.2331
3 any_is.na 4.453 5.238 4.5333 4.903 4.737 1.4302
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643282  34.4    1168576  62.5  1168576  62.5
Vcells 17552905 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.6317 0.7842 0.8685 0.8354 0.8602 1.482
1 anyMissing 0.6552 0.8215 0.9133 0.8631 0.9505 1.557
3 any_is.na 3.4834 3.9034 4.4829 4.0039 4.2865 32.934
expr min lq mean median uq max
2 anyNA 1.000 1.000 1.000 1.000 1.000 1.00
1 anyMissing 1.037 1.048 1.052 1.033 1.105 1.05
3 any_is.na 5.514 4.978 5.162 4.793 4.983 22.22
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643295  34.4    1168576  62.5  1168576  62.5
Vcells 17552921 134.0   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
anyMissing 7.758 7.853 8.370 8.178 8.500 11.84
anyNA 7.674 7.765 8.396 8.261 8.468 11.46
any_is.na 37.277 39.845 46.654 41.569 45.011 91.56
expr min lq mean median uq max
anyMissing 1.0000 1.0000 1.000 1.000 1.0000 1.0000
anyNA 0.9892 0.9888 1.003 1.010 0.9963 0.9681
any_is.na 4.8049 5.0738 5.574 5.083 5.2956 7.7330
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (naProb > 0) 
+         x[sample(n, size = naProb * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n=1000 vector

> x <- data[["n=1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643307  34.4    1168576  62.5  1168576  62.5
Vcells 23108750 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.0019 0.0019 0.0021 0.0019 0.0023 0.0042
1 anyMissing 0.0027 0.0031 0.0034 0.0031 0.0035 0.0212
3 any_is.na 0.0046 0.0050 0.0054 0.0050 0.0054 0.0131
expr min lq mean median uq max
2 anyNA 1.0 1.000 1.000 1.000 1.000 1.000
1 anyMissing 1.4 1.599 1.633 1.600 1.500 4.998
3 any_is.na 2.4 2.599 2.607 2.599 2.334 3.090
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000 vector

> x <- data[["n=10000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643319  34.4    1168576  62.5  1168576  62.5
Vcells 23108758 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.0208 0.0208 0.0213 0.0212 0.0212 0.0358
1 anyMissing 0.0216 0.0223 0.0229 0.0227 0.0227 0.0373
3 any_is.na 0.0397 0.0404 0.0419 0.0408 0.0412 0.1274
expr min lq mean median uq max
2 anyNA 1.000 1.000 1.000 1.000 1.000 1.000
1 anyMissing 1.037 1.074 1.073 1.073 1.073 1.043
3 any_is.na 1.907 1.944 1.965 1.927 1.945 3.559
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=100000 vector

> x <- data[["n=100000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643331  34.4    1168576  62.5  1168576  62.5
Vcells 23109046 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.2306 0.2331 0.2663 0.2396 0.2702 0.4677
1 anyMissing 0.2325 0.2383 0.2641 0.2458 0.2633 0.4350
3 any_is.na 0.4350 0.5039 0.5349 0.5166 0.5380 0.8600
expr min lq mean median uq max
2 anyNA 1.000 1.000 1.0000 1.000 1.0000 1.000
1 anyMissing 1.008 1.022 0.9918 1.026 0.9744 0.930
3 any_is.na 1.887 2.162 2.0087 2.156 1.9907 1.839
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=1000000 vector

> x <- data[["n=1000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643343  34.4    1168576  62.5  1168576  62.5
Vcells 23109385 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
anyMissing 2.411 2.608 3.037 2.679 3.503 5.077
anyNA 2.394 2.584 2.977 2.689 3.456 4.998
any_is.na 4.376 5.631 6.625 5.838 7.443 32.233
expr min lq mean median uq max
anyMissing 1.0000 1.0000 1.0000 1.000 1.0000 1.0000
anyNA 0.9927 0.9909 0.9802 1.004 0.9864 0.9844
any_is.na 1.8147 2.1595 2.1815 2.179 2.1244 6.3485
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n=10000000 vector

> x <- data[["n=10000000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   643355  34.4    1168576  62.5  1168576  62.5
Vcells 23109393 176.4   55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 24.07 26.08 30.44 29.54 33.78 42.76
1 anyMissing 23.88 26.46 30.34 30.00 32.43 41.13
3 any_is.na 55.41 60.57 70.94 64.54 72.30 114.00
expr min lq mean median uq max
2 anyNA 1.000 1.000 1.0000 1.000 1.0000 1.0000
1 anyMissing 0.992 1.014 0.9969 1.016 0.9599 0.9621
3 any_is.na 2.302 2.322 2.3307 2.185 2.1404 2.6662
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R Under development (unstable) (2015-02-27 r67909)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markdown_0.7.7          microbenchmark_1.4-2    matrixStats_0.14.0-9000
[4] ggplot2_1.0.0           knitr_1.9.3             R.devices_2.13.0       
[7] R.utils_2.0.0           R.oo_1.19.0             R.methodsS3_1.7.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.4         splines_3.2.0       MASS_7.3-39        
 [4] munsell_0.4.2       lattice_0.20-30     colorspace_1.2-4   
 [7] R.cache_0.11.1-9000 multcomp_1.3-9      stringr_0.6.2      
[10] plyr_1.8.1          tools_3.2.0         grid_3.2.0         
[13] gtable_0.1.2        TH.data_1.0-6       survival_2.38-1    
[16] digest_0.6.8        R.rsp_0.20.0        reshape2_1.4.1     
[19] formatR_1.0.3       base64enc_0.1-3     mime_0.2.1         
[22] evaluate_0.5.7      labeling_0.3        sandwich_2.3-2     
[25] scales_0.2.4        mvtnorm_1.0-2       zoo_1.7-12         
[28] Cairo_1.5-6         proto_0.3-10       

Total processing time was 32.14 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('anyMissing')

Copyright Henrik Bengtsson. Last updated on 2015-03-02 16:54:45 (-0800 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>

[Benchmark reports](Benchmark reports)

Clone this wiki locally