-
Notifications
You must be signed in to change notification settings - Fork 0
anyMissing
matrixStats: Benchmark report
This report benchmark the performance of anyMissing() against alternative methods.
- anyNA()
- any() + is.na()
as below
> any_is.na <- function(x) {
+ any(is.na(x))
+ }> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+ mode <- match.arg(mode)
+ if (mode == "logical") {
+ X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+ } else {
+ x <- runif(n, min = range[1], max = range[2])
+ }
+ storage.mode(x) <- mode
+ if (naProb > 0)
+ x[sample(n, size = naProb * n)] <- NA
+ x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+ set.seed(seed)
+ data <- list()
+ data[[1]] <- rvector(n = scale * 100, ...)
+ data[[2]] <- rvector(n = scale * 1000, ...)
+ data[[3]] <- rvector(n = scale * 10000, ...)
+ data[[4]] <- rvector(n = scale * 1e+05, ...)
+ data[[5]] <- rvector(n = scale * 1e+06, ...)
+ names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+ data
+ }
> data <- rvectors(mode = mode)> x <- data[["n=1000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643241 34.4 1168576 62.5 1168576 62.5
Vcells 17552012 134.0 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.0004 | 0.0004 | 0.0007 | 0.0008 | 0.0008 | 0.0042 |
| 1 | anyMissing | 0.0015 | 0.0015 | 0.0024 | 0.0019 | 0.0023 | 0.0431 |
| 3 | any_is.na | 0.0035 | 0.0039 | 0.0045 | 0.0042 | 0.0050 | 0.0192 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1 | anyMissing | 4 | 3.992 | 3.663 | 2.501 | 2.996 | 10.181 |
| 3 | any_is.na | 9 | 9.975 | 6.869 | 5.500 | 6.492 | 4.545 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
> x <- data[["n=10000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643258 34.4 1168576 62.5 1168576 62.5
Vcells 17552377 134.0 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.0058 | 0.0062 | 0.0070 | 0.0062 | 0.0065 | 0.0277 |
| 1 | anyMissing | 0.0069 | 0.0073 | 0.0096 | 0.0077 | 0.0085 | 0.0493 |
| 3 | any_is.na | 0.0250 | 0.0262 | 0.0309 | 0.0277 | 0.0304 | 0.0647 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1 | anyMissing | 1.200 | 1.188 | 1.374 | 1.250 | 1.294 | 1.778 |
| 3 | any_is.na | 4.333 | 4.250 | 4.447 | 4.499 | 4.646 | 2.333 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
> x <- data[["n=100000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643270 34.4 1168576 62.5 1168576 62.5
Vcells 17552385 134.0 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.0654 | 0.0670 | 0.0882 | 0.0755 | 0.0907 | 0.7499 |
| 1 | anyMissing | 0.0678 | 0.0693 | 0.0860 | 0.0785 | 0.0962 | 0.1748 |
| 3 | any_is.na | 0.2914 | 0.3509 | 0.3996 | 0.3699 | 0.4294 | 1.0725 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000 | 1.000 | 1.0000 | 1.000 | 1.000 | 1.0000 |
| 1 | anyMissing | 1.035 | 1.034 | 0.9753 | 1.041 | 1.062 | 0.2331 |
| 3 | any_is.na | 4.453 | 5.238 | 4.5333 | 4.903 | 4.737 | 1.4302 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
> x <- data[["n=1000000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643282 34.4 1168576 62.5 1168576 62.5
Vcells 17552905 134.0 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.6317 | 0.7842 | 0.8685 | 0.8354 | 0.8602 | 1.482 |
| 1 | anyMissing | 0.6552 | 0.8215 | 0.9133 | 0.8631 | 0.9505 | 1.557 |
| 3 | any_is.na | 3.4834 | 3.9034 | 4.4829 | 4.0039 | 4.2865 | 32.934 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.00 |
| 1 | anyMissing | 1.037 | 1.048 | 1.052 | 1.033 | 1.105 | 1.05 |
| 3 | any_is.na | 5.514 | 4.978 | 5.162 | 4.793 | 4.983 | 22.22 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
> x <- data[["n=10000000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643295 34.4 1168576 62.5 1168576 62.5
Vcells 17552921 134.0 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max |
|---|---|---|---|---|---|---|
| anyMissing | 7.758 | 7.853 | 8.370 | 8.178 | 8.500 | 11.84 |
| anyNA | 7.674 | 7.765 | 8.396 | 8.261 | 8.468 | 11.46 |
| any_is.na | 37.277 | 39.845 | 46.654 | 41.569 | 45.011 | 91.56 |
| expr | min | lq | mean | median | uq | max |
|---|---|---|---|---|---|---|
| anyMissing | 1.0000 | 1.0000 | 1.000 | 1.000 | 1.0000 | 1.0000 |
| anyNA | 0.9892 | 0.9888 | 1.003 | 1.010 | 0.9963 | 0.9681 |
| any_is.na | 4.8049 | 5.0738 | 5.574 | 5.083 | 5.2956 | 7.7330 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds. | ||||||
![]() |
> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), naProb = 0) {
+ mode <- match.arg(mode)
+ if (mode == "logical") {
+ X <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+ } else {
+ x <- runif(n, min = range[1], max = range[2])
+ }
+ storage.mode(x) <- mode
+ if (naProb > 0)
+ x[sample(n, size = naProb * n)] <- NA
+ x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+ set.seed(seed)
+ data <- list()
+ data[[1]] <- rvector(n = scale * 100, ...)
+ data[[2]] <- rvector(n = scale * 1000, ...)
+ data[[3]] <- rvector(n = scale * 10000, ...)
+ data[[4]] <- rvector(n = scale * 1e+05, ...)
+ data[[5]] <- rvector(n = scale * 1e+06, ...)
+ names(data) <- sprintf("n=%d", sapply(data, FUN = length))
+ data
+ }
> data <- rvectors(mode = mode)> x <- data[["n=1000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643307 34.4 1168576 62.5 1168576 62.5
Vcells 23108750 176.4 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.0019 | 0.0019 | 0.0021 | 0.0019 | 0.0023 | 0.0042 |
| 1 | anyMissing | 0.0027 | 0.0031 | 0.0034 | 0.0031 | 0.0035 | 0.0212 |
| 3 | any_is.na | 0.0046 | 0.0050 | 0.0054 | 0.0050 | 0.0054 | 0.0131 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1 | anyMissing | 1.4 | 1.599 | 1.633 | 1.600 | 1.500 | 4.998 |
| 3 | any_is.na | 2.4 | 2.599 | 2.607 | 2.599 | 2.334 | 3.090 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
> x <- data[["n=10000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643319 34.4 1168576 62.5 1168576 62.5
Vcells 23108758 176.4 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.0208 | 0.0208 | 0.0213 | 0.0212 | 0.0212 | 0.0358 |
| 1 | anyMissing | 0.0216 | 0.0223 | 0.0229 | 0.0227 | 0.0227 | 0.0373 |
| 3 | any_is.na | 0.0397 | 0.0404 | 0.0419 | 0.0408 | 0.0412 | 0.1274 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1 | anyMissing | 1.037 | 1.074 | 1.073 | 1.073 | 1.073 | 1.043 |
| 3 | any_is.na | 1.907 | 1.944 | 1.965 | 1.927 | 1.945 | 3.559 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
> x <- data[["n=100000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643331 34.4 1168576 62.5 1168576 62.5
Vcells 23109046 176.4 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.2306 | 0.2331 | 0.2663 | 0.2396 | 0.2702 | 0.4677 |
| 1 | anyMissing | 0.2325 | 0.2383 | 0.2641 | 0.2458 | 0.2633 | 0.4350 |
| 3 | any_is.na | 0.4350 | 0.5039 | 0.5349 | 0.5166 | 0.5380 | 0.8600 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000 | 1.000 | 1.0000 | 1.000 | 1.0000 | 1.000 |
| 1 | anyMissing | 1.008 | 1.022 | 0.9918 | 1.026 | 0.9744 | 0.930 |
| 3 | any_is.na | 1.887 | 2.162 | 2.0087 | 2.156 | 1.9907 | 1.839 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=100000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
> x <- data[["n=1000000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643343 34.4 1168576 62.5 1168576 62.5
Vcells 23109385 176.4 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max |
|---|---|---|---|---|---|---|
| anyMissing | 2.411 | 2.608 | 3.037 | 2.679 | 3.503 | 5.077 |
| anyNA | 2.394 | 2.584 | 2.977 | 2.689 | 3.456 | 4.998 |
| any_is.na | 4.376 | 5.631 | 6.625 | 5.838 | 7.443 | 32.233 |
| expr | min | lq | mean | median | uq | max |
|---|---|---|---|---|---|---|
| anyMissing | 1.0000 | 1.0000 | 1.0000 | 1.000 | 1.0000 | 1.0000 |
| anyNA | 0.9927 | 0.9909 | 0.9802 | 1.004 | 0.9864 | 0.9844 |
| any_is.na | 1.8147 | 2.1595 | 2.1815 | 2.179 | 2.1244 | 6.3485 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=1000000 data. Outliers are displayed as crosses. Times are in milliseconds. | ||||||
![]() |
> x <- data[["n=10000000"]]
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 643355 34.4 1168576 62.5 1168576 62.5
Vcells 23109393 176.4 55641873 424.6 68120027 519.8
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 24.07 | 26.08 | 30.44 | 29.54 | 33.78 | 42.76 |
| 1 | anyMissing | 23.88 | 26.46 | 30.34 | 30.00 | 32.43 | 41.13 |
| 3 | any_is.na | 55.41 | 60.57 | 70.94 | 64.54 | 72.30 | 114.00 |
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000 | 1.000 | 1.0000 | 1.000 | 1.0000 | 1.0000 |
| 1 | anyMissing | 0.992 | 1.014 | 0.9969 | 1.016 | 0.9599 | 0.9621 |
| 3 | any_is.na | 2.302 | 2.322 | 2.3307 | 2.185 | 2.1404 | 2.6662 |
| Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n=10000000 data. Outliers are displayed as crosses. Times are in milliseconds. | |||||||
![]() |
R Under development (unstable) (2015-02-27 r67909)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] markdown_0.7.7 microbenchmark_1.4-2 matrixStats_0.14.0-9000
[4] ggplot2_1.0.0 knitr_1.9.3 R.devices_2.13.0
[7] R.utils_2.0.0 R.oo_1.19.0 R.methodsS3_1.7.0
loaded via a namespace (and not attached):
[1] Rcpp_0.11.4 splines_3.2.0 MASS_7.3-39
[4] munsell_0.4.2 lattice_0.20-30 colorspace_1.2-4
[7] R.cache_0.11.1-9000 multcomp_1.3-9 stringr_0.6.2
[10] plyr_1.8.1 tools_3.2.0 grid_3.2.0
[13] gtable_0.1.2 TH.data_1.0-6 survival_2.38-1
[16] digest_0.6.8 R.rsp_0.20.0 reshape2_1.4.1
[19] formatR_1.0.3 base64enc_0.1-3 mime_0.2.1
[22] evaluate_0.5.7 labeling_0.3 sandwich_2.3-2
[25] scales_0.2.4 mvtnorm_1.0-2 zoo_1.7-12
[28] Cairo_1.5-6 proto_0.3-10 Total processing time was 32.14 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('anyMissing')Copyright Henrik Bengtsson. Last updated on 2015-03-02 16:54:45 (-0800 UTC). Powered by RSP.
<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAMAAABEpIrGAAAADFBMVEX9/v0AAP/9/v3//wBEQjoBAAAABHRSTlP//wD//gy7CwAAAGJJREFUOI3N0rESwCAIA9Ag///PXdoiBk0HhmbNO49DMETQCexNCSyFgdlGoO5DYOr9ThLgPosA7osIQP0sHuDOog8UI/ALa988wzdwXJRctf4s+d36YPTJ6aMd8ux3+QO4ABTtB85yDAh9AAAAAElFTkSuQmCC" document.getElementsByTagName('head')[0].appendChild(link); </script>[Benchmark reports](Benchmark reports)









