Skip to content

ARROW-9405: [R] Switch to cpp11 #7819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 62 commits into from

Conversation

romainfrancois
Copy link
Contributor

@romainfrancois romainfrancois commented Jul 22, 2020

No description provided.

@nealrichardson nealrichardson changed the title [R]: Switch from Rcpp to cpp11 ARROW-9405: [R] Switch to cpp11 Jul 22, 2020
@github-actions
Copy link

@bkietz bkietz requested review from nealrichardson and bkietz and removed request for nealrichardson July 23, 2020 13:07
Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this come together. The merge conflict with master is probably that I deleted a cpp function and you adapted it to cpp11.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some style/lint comments

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more style/lint comments

@kszucs
Copy link
Member

kszucs commented Jul 27, 2020

@romainfrancois please rebase, it has some conflicts with the master.

@wesm
Copy link
Member

wesm commented Jul 28, 2020

Since this has a bunch of merge commits it will have to be git merge --squash'd to clean things up, let me know if I can help

@wesm
Copy link
Member

wesm commented Jul 28, 2020

@romainfrancois I squashed, be sure to git fetch origin && git reset --hard origin/cpp11

@romainfrancois romainfrancois force-pushed the cpp11 branch 2 times, most recently from ba72863 to d4117ae Compare August 4, 2020 13:53
r/R/schema.R Outdated
Comment on lines 86 to 88
names = function() {
out <- Schema__field_names(self)
# Hack: Rcpp should set the encoding
Encoding(out) <- "UTF-8"
out
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cpp11 is doing what it claims wrt unicode, you can remove this and just

names = function() Schema__field_names(self),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test ?

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes. I think you can also grep for Rf_translateCharUTF8 in r/src and remove those coercions since cpp11 should handle that automatically

@nealrichardson
Copy link
Member

@github-actions autotune everything

romainfrancois added a commit to romainfrancois/arrow that referenced this pull request Aug 18, 2020
romainfrancois added a commit to romainfrancois/arrow that referenced this pull request Aug 18, 2020
Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a thorough review, just scanned for a few things to make sure to resolve before we merge

r/DESCRIPTION Outdated
@@ -50,6 +49,8 @@ Suggests:
rmarkdown,
testthat,
tibble
Remotes:
r-lib/cpp11#97
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're ready to merge this, we should vendor cpp11. I made https://issues.apache.org/jira/browse/ARROW-9786 to go back to the released cpp11 before we do the 2.0.0 release (October-ish).

@romainfrancois
Copy link
Contributor Author

I'm trying to debug what's wrong with the python tests that I previously silenced in this PR:

That looks fine:

library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
str(py)
#> [
#>   1,
#>   2,
#>   3
#> ]
class(py)
#> [1] "pyarrow.lib.DoubleArray"        "pyarrow.lib.FloatingPointArray"
#> [3] "pyarrow.lib.NumericArray"       "pyarrow.lib.Array"             
#> [5] "pyarrow.lib._PandasConvertible" "python.builtin.object"

But then if I also load arrow I'm getting a segfault on pa$array() call:

library(reticulate)
library(arrow)
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))

@romainfrancois
Copy link
Contributor Author

romainfrancois commented Aug 19, 2020

The error seems to happen on this call:

x$`_export_to_c`(array_ptr, schema_ptr)

although there seems to be something there:

Browse[2]> x$`_export_to_c`
<built-in method _export_to_c of pyarrow.lib.DoubleArray>

@romainfrancois
Copy link
Contributor Author

And I can confirm that this works on master locally:

library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
py
#> Array
#> <double>
#> [
#>   1,
#>   2,
#>   3
#> ]

Created on 2020-08-19 by the reprex package (v0.3.0.9001)

More investigation needed.

namespace unsafe {

inline SEXP utf8_r_string(SEXP s) {
if (!IS_UTF8(s) && !IS_ASCII(s)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these utilities are necessary? translateCharUTF8 also checks IS_UTF8 and IS_ASCII and exits early: https://github.com/wch/r-source/blob/122dcf452ec5eacdd66e165457985bded1af4fee/src/main/sysutils.c#L1085

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might not all be necessary, I'm trying to avoid the Rf_mkChar(Rf_translateCharUTF8()) combo, which would unnecessarily make a SEXP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is thin for now, but this might evolve into perhaps something like cpp11::utf8_strings class that would eagerly convert at construction time.

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pulled this and ran locally, and compile time (single-threaded) for the r/src bindings is cut in half: master takes 93 seconds on my machine, and this takes 48 seconds.

Thanks for doing this!

emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 16, 2020
Closes apache#7819 from romainfrancois/cpp11

Lead-authored-by: Romain Francois <romain@rstudio.com>
Co-authored-by: Romain François <romain@rstudio.com>
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Closes apache#7819 from romainfrancois/cpp11

Lead-authored-by: Romain Francois <romain@rstudio.com>
Co-authored-by: Romain François <romain@rstudio.com>
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants