Description
Right now, r_string
has a private data member called sexp data_
, which holds the CHARSXP it wraps
cpp11/inst/include/cpp11/r_string.hpp
Line 50 in 2ddc12f
I propose that we consider changing that from sexp
to SEXP
. The benefit of sexp
is that it provides automatic protection of that object, but it doesn't need it! CHARSXPs stored in R's global string pool are automatically protected for the life of the R session, so protecting them again is extraneous, and more importantly, pretty expensive.
There are two obvious and extremely common cases that would have significant performance improvements if we did this:
- Element extraction from
cpp11::strings
throughx[i]
, which creates and returns anr_string
(in this particular case, thedata_
is also protected byx
) - Element assignment into
cpp11::writable::strings
throughx[i] =
, for example, when assigning astd::string
, which has to be converted tor_string
first.
The change would involve moving from sexp
to SEXP
here:
cpp11/inst/include/cpp11/r_string.hpp
Line 50 in 2ddc12f
And then making the ==
comparison operators compare data_
rather than data_.data()
here:
cpp11/inst/include/cpp11/r_string.hpp
Lines 35 to 37 in 2ddc12f
These changes in combination with #299 would have made cpp11 fast enough that I wouldn't have needed to drop down to the R API here https://github.com/EmilHvitfeldt/cpp11ngram/pull/1/files#diff-aa381cabaf80f28b6f14233769693f3faf2d817395969f33e64e014284a51130R20-R33
Performance examples follow below
For element extraction, let's look at using the raw R API to see how fast it could be:
set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)
cpp11::cpp_function("
cpp11::sexp test_extract_r_api(cpp11::strings x) {
const R_xlen_t size = x.size();
for (R_xlen_t i = 0; i < size; ++i) {
SEXP elt = STRING_ELT(x, i);
}
return R_NilValue;
}
")
bench::mark(test_extract_r_api(x))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test_extract_r_api(x) 1.51ms 1.93ms 524. 0B 0
Then with cpp11:
set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)
cpp11::cpp_function("
cpp11::sexp test_extract_cpp11(cpp11::strings x) {
const R_xlen_t size = x.size();
for (R_xlen_t i = 0; i < size; ++i) {
cpp11::r_string elt = x[i];
}
return R_NilValue;
}
")
bench::mark(test_extract_cpp11(x))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test_extract_cpp11(x) 60.2ms 63.5ms 15.3 0B 23.0
This is slow because x[i]
does STRING_ELT()
to get the SEXP
, but then casts that to r_string
, which wraps the SEXP
into a sexp
, forcing it to be protected.
Now that example again, but with sexp
swapped for SEXP
, avoiding the extra protection:
bench::mark(test_extract_cpp11(x))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test_extract_cpp11(x) 1.43ms 1.55ms 595. 0B 0
Just as fast as the raw R API!
Now let's look at element assignment. We are going to assign the string "foo"
1 million times to an output vector.
With the raw R API:
set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)
cpp11::cpp_function("
cpp11::writable::strings
test_std_string_assign_r_api() {
cpp11::writable::strings out(1000000);
const R_xlen_t size = out.size();
const std::string x = \"foo\";
for (R_xlen_t i = 0; i < size; ++i) {
SET_STRING_ELT(out, i, Rf_mkCharLenCE(x.data(), x.size(), CE_UTF8));
}
return out;
}
")
bench::mark(test_std_string_assign_r_api())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test_std_string_assign_r_api() 24ms 26.8ms 35.6 7.63MB 7.63
Now with a naive cpp11 approach:
set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)
cpp11::cpp_function("
cpp11::writable::strings
test_std_string_assign_naive() {
cpp11::writable::strings out(1000000);
const R_xlen_t size = out.size();
const std::string x = \"foo\";
for (R_xlen_t i = 0; i < size; ++i) {
out[i] = x;
}
return out;
}
")
bench::mark(test_std_string_assign_naive())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test_std_string_assign_naive() 1.38s 1.38s 0.723 7.63MB 0
Wow, painfully slow. Now, the biggest issue here is actually that unwind_protect()
is running twice at each loop iteration:
- It
unwind_protect()
s throughsafe[Rf_mkCharLenCE]
when turningx
into aCHARSXP
then to ar_string
cpp11/inst/include/cpp11/r_string.hpp
Lines 19 to 20 in 2ddc12f
- It
unwind_protect()
s through the proxy=
assignment when callingSET_STRING_ELT()
cpp11/inst/include/cpp11/strings.hpp
Lines 55 to 60 in 2ddc12f
So let's assume we know we can wrap the loop in unwind_protect()
to limit the number of unwind-protects, AND let's assume we have #299 installed so that technique actually works.
set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)
cpp11::cpp_function("
cpp11::writable::strings
test_std_string_assign_cpp11() {
cpp11::writable::strings out(1000000);
const R_xlen_t size = out.size();
const std::string x = \"foo\";
cpp11::unwind_protect([&] {
for (R_xlen_t i = 0; i < size; ++i) {
out[i] = x;
}
});
return out;
}
")
bench::mark(test_std_string_assign_cpp11())
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test_std_string_assign_cpp11() 92ms 102ms 9.66 7.63MB 15.5
Better, but still 4x slower than base R.
Again, the problem here is that x
goes from std::string
to CHARSXP to r_string
, and when it gets put in the r_string
it gets wrapped in sexp
and is "extra" protected.
Now that unwind_protect()
example again, but with sexp
swapped for SEXP
, avoiding the extra protection:
bench::mark(test_std_string_assign_cpp11())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 test_std_string_assign_cpp11() 29.1ms 31.1ms 32.0 7.63MB 7.99
Again, essentially as fast as the R API!