-
Notifications
You must be signed in to change notification settings - Fork 3.7k
ARROW-9405: [R] Switch to cpp11 #7819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to see this come together. The merge conflict with master is probably that I deleted a cpp function and you adapted it to cpp11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some style/lint comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more style/lint comments
@romainfrancois please rebase, it has some conflicts with the master. |
Since this has a bunch of merge commits it will have to be |
@romainfrancois I squashed, be sure to |
ba72863
to
d4117ae
Compare
r/R/schema.R
Outdated
names = function() { | ||
out <- Schema__field_names(self) | ||
# Hack: Rcpp should set the encoding | ||
Encoding(out) <- "UTF-8" | ||
out | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If cpp11
is doing what it claims wrt unicode, you can remove this and just
names = function() Schema__field_names(self),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes. I think you can also grep for Rf_translateCharUTF8
in r/src
and remove those coercions since cpp11
should handle that automatically
@github-actions autotune everything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a thorough review, just scanned for a few things to make sure to resolve before we merge
r/DESCRIPTION
Outdated
@@ -50,6 +49,8 @@ Suggests: | |||
rmarkdown, | |||
testthat, | |||
tibble | |||
Remotes: | |||
r-lib/cpp11#97 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we're ready to merge this, we should vendor cpp11. I made https://issues.apache.org/jira/browse/ARROW-9786 to go back to the released cpp11 before we do the 2.0.0 release (October-ish).
I'm trying to debug what's wrong with the python tests that I previously silenced in this PR: That looks fine: library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
str(py)
#> [
#> 1,
#> 2,
#> 3
#> ]
class(py)
#> [1] "pyarrow.lib.DoubleArray" "pyarrow.lib.FloatingPointArray"
#> [3] "pyarrow.lib.NumericArray" "pyarrow.lib.Array"
#> [5] "pyarrow.lib._PandasConvertible" "python.builtin.object" But then if I also load library(reticulate)
library(arrow)
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3)) |
The error seems to happen on this call: x$`_export_to_c`(array_ptr, schema_ptr) although there seems to be something there: Browse[2]> x$`_export_to_c`
<built-in method _export_to_c of pyarrow.lib.DoubleArray> |
And I can confirm that this works on master locally: library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
py
#> Array
#> <double>
#> [
#> 1,
#> 2,
#> 3
#> ] Created on 2020-08-19 by the reprex package (v0.3.0.9001) More investigation needed. |
r/src/arrow_cpp11.h
Outdated
namespace unsafe { | ||
|
||
inline SEXP utf8_r_string(SEXP s) { | ||
if (!IS_UTF8(s) && !IS_ASCII(s)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these utilities are necessary? translateCharUTF8
also checks IS_UTF8 and IS_ASCII and exits early: https://github.com/wch/r-source/blob/122dcf452ec5eacdd66e165457985bded1af4fee/src/main/sysutils.c#L1085
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They might not all be necessary, I'm trying to avoid the Rf_mkChar(Rf_translateCharUTF8())
combo, which would unnecessarily make a SEXP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is thin for now, but this might evolve into perhaps something like cpp11::utf8_strings
class that would eagerly convert at construction time.
…pp11::warning() that fails locally for some reason
…tring to std::string)
- TraverseDotsNoName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pulled this and ran locally, and compile time (single-threaded) for the r/src
bindings is cut in half: master takes 93 seconds on my machine, and this takes 48 seconds.
Thanks for doing this!
Closes apache#7819 from romainfrancois/cpp11 Lead-authored-by: Romain Francois <romain@rstudio.com> Co-authored-by: Romain François <romain@rstudio.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Closes apache#7819 from romainfrancois/cpp11 Lead-authored-by: Romain Francois <romain@rstudio.com> Co-authored-by: Romain François <romain@rstudio.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
No description provided.