-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify is_na() #799
Simplify is_na() #799
Conversation
What speedups are you seeing? Also, line 47 and 48 probably want the same change for |
For this version (38d024f): sum_is_na <- Rcpp::cppFunction("
int sum_is_na(NumericVector x) {
int n = 0;
for (R_xlen_t i = 0; i < x.size(); ++i) {
double xi = x[i];
if (Rcpp::traits::is_na<REALSXP>(xi)) ++n;
}
return n;
}
"
)
N <- 5e6
x <- runif(N)
x[x < 0.5] <- NA_real_
microbenchmark::microbenchmark(
sum(is.na(x)),
sum_is_na(x)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sum(is.na(x)) 10.24388 10.62800 12.26077 11.25501 13.76406 18.73605 100
#> sum_is_na(x) 20.23770 20.65917 21.54281 21.25047 22.05458 25.35780 100
#> cld
#> a
#> b
For a version using #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sum(is.na(x)) 10.32959 10.55353 12.03510 11.08237 13.15562 16.21278 100
#> sum_is_na(x) 13.08563 13.35475 13.72524 13.57882 14.02745 16.02345 100
#> cld
#> a
#> b For master (f1ec6cf): #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sum(is.na(x)) 10.31513 10.62298 12.18608 11.00471 13.85499 22.86626 100
#> sum_is_na(x) 49.11284 49.96843 51.05028 50.77832 52.05040 55.72882 100
#> cld
#> a
#> b Bottom line: Faster by factor ~2.5, if we believe that Will update complex implementation once we reach consensus. |
2x speedup is good, even on a minuscule task. Just want to make really sure the results are the same but I suppose our unit tests are comprehensive. |
Can you please extend this to the complex case on lines 47 and 48 ? |
With |
Sorry, you misunderstood what I meant. There was a commit missing. See #800 which I'll open in just a second as an extension of this. I made the change a few hours ago locally, and ran a rev.dep. Do you have a Windows machine handy? I have been bisecting the last few hours trying to find what broke Windows. I get tests timeout on R Hub. I think it is #789 by @lionel but I am not quite sure. |
Sure, I can run a few tests. Do you want me to run R CMD check with pre- and post-#789? On R-devel? |
Yes, I just sent you a link to your "Kirill and R" handle at the mail box ... We can continue there. I have some note, but I am mostly dizzy and need a break. A second set of eyes for checking would be good. |
This speeds up the check, because compilers aren't likely to be able to optimize two external functions that basically return
f(x) && g(x)
andf(x) && !g(x)
. The first mention of theR_isnancpp()
function dates back to 2005, wch/r-source@17443788ef.Related: #790