Description
I believe that the unwind_protect()
optimization originally added in #207 isn't quite right. I think it was broken after that PR was merged in 9a62c3a with the move to a local static object. I can prove that nested calls to unwind_protect()
aren't being skipped as they should be.
First, here is a benchmark to show that the nested unwind_protect()
optimization isn't working, and the performance loss is pretty brutal. The unwind_protect()
calls happen on every iteration of the loop, because assigning an r_string
to a writable::strings
proxy element is wrapped in unwind_protect()
, even though I wrapped the whole loop in unwind_protect()
too.
cpp11/inst/include/cpp11/strings.hpp
Line 58 in 2aba46e
set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)
cpp11::cpp_function("
cpp11::writable::strings test(cpp11::strings x) {
R_xlen_t size = x.size();
cpp11::writable::strings out(size);
cpp11::unwind_protect([&] {
for (R_xlen_t i = 0; i < size; ++i) {
out[i] = x[i];
}
});
return out;
}
")
cpp11::cpp_function("
SEXP test2(SEXP x) {
R_xlen_t size = Rf_xlength(x);
SEXP out = PROTECT(Rf_allocVector(STRSXP, size));
for (R_xlen_t i = 0; i < size; ++i) {
SET_STRING_ELT(out, i, STRING_ELT(x, i));
}
UNPROTECT(1);
return out;
}
")
bench::mark(test(x), test2(x), iterations = 20)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test(x) 715.35ms 751.55ms 1.31 7.63MB 1.96
#> 2 test2(x) 7.65ms 9.11ms 100. 7.63MB 35.1
Next, here is a video that shows how inner calls to unwind_protect()
aren't exiting early. It uses VS Code to step through the example from above. The sequence of steps are:
- We step through the outer
unwind_protect()
call to see how it sets up the staticshould_unwind_protect
local object - Then we step through the first iteration of the
out[i] = x[i]
copy. That puts us back inunwind_protect()
, but that static local object is not yet initialized (weird right?), so we end up going back through the process of setting it up. That meansunwind_protect()
doesn't early exit here when it should. - Then on the second iteration of
out[i] = x[i]
, the static localshould_unwind_protect
is set up, but is set toTRUE
so we again can't early exit. This continues on for the rest of the loop.
I'm fairly certain the main problem here is that SEXP unwind_protect(Fun&& code)
is a template function, meaning that each new instantiation of it gets its own copy of should_unwind_protect
, and it has to be set up every time.
I've fixed this in a branch I'll send in, and I'll explain the fix there.