Skip to content

ARROW-9405: [R] Switch to cpp11 #7819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 62 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
a7ffab5
Squashed commit of the following:
romainfrancois Jul 28, 2020
b4a6192
dataset.cpp -Rcpp
romainfrancois Aug 3, 2020
5da7df7
use RF_type2char instead of Rcpp::type2name
romainfrancois Aug 3, 2020
655285a
-Rcpp
romainfrancois Aug 3, 2020
14c55bc
table.cpp -Rcpp
romainfrancois Aug 3, 2020
0c6acba
parquet.cpp -Rcpp
romainfrancois Aug 3, 2020
811be4c
array.cpp buffer.cpp -Rcpp
romainfrancois Aug 3, 2020
1aacf0f
move test helper function to helper file
romainfrancois Aug 3, 2020
c840073
skip for now
romainfrancois Aug 3, 2020
da9f05c
Rework Buffer without Rcpp
romainfrancois Aug 3, 2020
3aec730
use cpp11::stop()
romainfrancois Aug 3, 2020
e583308
remove some Rcpp uses
romainfrancois Aug 3, 2020
97f524f
rather mention c++ than Rcpp
romainfrancois Aug 4, 2020
e228053
Rework array_to_vector so that it does not use Rcpp
romainfrancois Aug 4, 2020
f9f126e
No longer need Rcpp
romainfrancois Aug 4, 2020
80ed2a4
Autoformat/render all the things [automated commit]
romainfrancois Aug 4, 2020
0ad4771
s/Rcpp/cpp11/
romainfrancois Aug 5, 2020
b5c5dbc
arrow_rcpp -> arrow_cpp11
romainfrancois Aug 5, 2020
f36ff3d
cpp11 handles enums, but this needs https://github.com/r-lib/cpp11/pu…
romainfrancois Aug 5, 2020
84d542d
no longer need class arrow::r::Index
romainfrancois Aug 5, 2020
cd8c040
lint
romainfrancois Aug 5, 2020
f1659ae
You don't need a ; after a }
romainfrancois Aug 5, 2020
b511786
use DATAPTR() instead of vector_begin()
romainfrancois Aug 5, 2020
5bbdbd7
+ TraverseDotsNoName for when traversing ... without dealing wkth the…
romainfrancois Aug 5, 2020
2d5fc61
not using verify_output() as the error differs on github actions
romainfrancois Aug 6, 2020
65289c7
No longer need to set the encoding to utf-8 as cpp11 will prperly mak…
romainfrancois Aug 6, 2020
1802182
rebase and adjust to cpp11
romainfrancois Aug 6, 2020
3e9eabe
skipping python related tests for now.
romainfrancois Aug 6, 2020
0ab7091
PR https://github.com/r-lib/cpp11/pull/74 was merged
romainfrancois Aug 6, 2020
9ddb222
Rework StringVectorConverter<>::Ingest() to use cpp11
romainfrancois Aug 6, 2020
e536ee4
lint
romainfrancois Aug 6, 2020
7077fd2
using https://github.com/r-lib/cpp11/pull/85
romainfrancois Aug 17, 2020
234ba84
cpp11::stop() is marked as noreturn
romainfrancois Aug 17, 2020
19e580d
Update r/src/array_from_vector.cpp
romainfrancois Aug 17, 2020
758f7e9
Update r/src/array_from_vector.cpp
romainfrancois Aug 17, 2020
5dbb648
Update r/src/arrow_cpp11.h
romainfrancois Aug 17, 2020
0843196
update decor after rebase
romainfrancois Aug 17, 2020
8957299
Actually this still needs as_sexp<enum> but cpp11 should have it
romainfrancois Aug 17, 2020
5627aaf
camel case
romainfrancois Aug 17, 2020
cca34a3
Converter_Dictionary cannot ingest in parallel, and avoid a form of c…
romainfrancois Aug 17, 2020
f244547
added comment for https://github.com/apache/arrow/pull/7819#discussio…
romainfrancois Aug 18, 2020
021c2b9
using master cpp11
romainfrancois Aug 18, 2020
a7d8a91
-W-sign-compare finding
romainfrancois Aug 18, 2020
dab5115
using https://github.com/r-lib/cpp11/pull/97
romainfrancois Aug 18, 2020
d8eb365
g++ needs space
romainfrancois Aug 18, 2020
3f45697
one more use of arrow::r::TraverseDots()
romainfrancois Aug 18, 2020
4d4d220
traverse dots automatically using utf8 (thanks to conversion from r_s…
romainfrancois Aug 18, 2020
61b0aa5
TraverseDots(cpp11::list dots, ...)
romainfrancois Aug 18, 2020
2b6ea2b
.size() rather than XLENGTH()
romainfrancois Aug 18, 2020
6cdb4f3
avoid conversion to utf8 when not needed
romainfrancois Aug 19, 2020
c5bf268
Doing conversions upftont, so using UnsafeAppend
romainfrancois Aug 19, 2020
d6146c0
Buffer only parameterized by vector type
romainfrancois Aug 19, 2020
06c52f0
re-add ability to create a Buffer from an R complex vector
romainfrancois Aug 19, 2020
a78f3c3
marking ctor explicit
romainfrancois Aug 19, 2020
f7d1baa
explicitly make a complexs vector
romainfrancois Aug 19, 2020
7eabb63
work around cpp11 converting uintptr_t to int where Rcpp did convert …
romainfrancois Aug 20, 2020
e2669cc
no longer skip python tests
romainfrancois Aug 20, 2020
9ed0846
https://github.com/r-lib/cpp11/pull/97 was merged
romainfrancois Aug 20, 2020
1ea6b99
vendor cpp11
romainfrancois Aug 20, 2020
701e35e
ignore vendored cpp11 files
romainfrancois Aug 20, 2020
9a993cc
LinkingTo: self
romainfrancois Aug 20, 2020
bb0371b
:newspaper: and python test tweak
nealrichardson Aug 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions dev/release/rat_exclude_files.txt
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,8 @@ r/man/*.Rd
r/cran-comments.md
r/vignettes/*.Rmd
r/tests/testthat/test-*.txt
r/inst/include/cpp11.hpp
r/inst/include/cpp11/*.hpp
.gitattributes
ruby/red-arrow/.yardopts
rust/arrow/test/data/*.csv
Expand Down
3 changes: 0 additions & 3 deletions r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,12 @@ Language: en-US
LazyData: true
SystemRequirements: C++11
Biarch: true
LinkingTo:
Rcpp (>= 1.0.1)
Imports:
assertthat,
bit64,
methods,
purrr,
R6,
Rcpp (>= 1.0.1),
rlang,
tidyselect,
utils,
Expand Down
1 change: 0 additions & 1 deletion r/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,6 @@ export(write_ipc_stream)
export(write_parquet)
export(write_to_raw)
importFrom(R6,R6Class)
importFrom(Rcpp,sourceCpp)
importFrom(assertthat,assert_that)
importFrom(assertthat,is.string)
importFrom(bit64,print.integer64)
Expand Down
4 changes: 4 additions & 0 deletions r/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@

* S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R >= 4.0) packages

## Other improvements

* `arrow` now depends on [`cpp11`](https://cpp11.r-lib.org/), which brings more robust UTF-8 handling and faster compilation

# arrow 1.0.1

## Bug fixes
Expand Down
1 change: 0 additions & 1 deletion r/R/arrow-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
#' @importFrom purrr as_mapper map map2 map_chr map_dfr map_int map_lgl
#' @importFrom assertthat assert_that is.string
#' @importFrom rlang list2 %||% is_false abort dots_n warn enquo quo_is_null enquos is_integerish quos eval_tidy new_data_mask syms env env_bind as_label set_names
#' @importFrom Rcpp sourceCpp
#' @importFrom tidyselect vars_select
#' @useDynLib arrow, .registration = TRUE
#' @keywords internal
Expand Down
36 changes: 18 additions & 18 deletions r/R/arrowExports.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 2 additions & 11 deletions r/R/json.R
Original file line number Diff line number Diff line change
Expand Up @@ -74,12 +74,7 @@ JsonTableReader$create <- function(file,
#' @export
JsonReadOptions <- R6Class("JsonReadOptions", inherit = ArrowObject)
JsonReadOptions$create <- function(use_threads = option_use_threads(), block_size = 1048576L) {
shared_ptr(JsonReadOptions, json___ReadOptions__initialize(
list(
use_threads = use_threads,
block_size = block_size
)
))
shared_ptr(JsonReadOptions, json___ReadOptions__initialize(use_threads, block_size))
}

#' @rdname CsvReadOptions
Expand All @@ -89,9 +84,5 @@ JsonReadOptions$create <- function(use_threads = option_use_threads(), block_siz
#' @export
JsonParseOptions <- R6Class("JsonParseOptions", inherit = ArrowObject)
JsonParseOptions$create <- function(newlines_in_values = FALSE) {
shared_ptr(JsonParseOptions, json___ParseOptions__initialize(
list(
newlines_in_values = newlines_in_values
)
))
shared_ptr(JsonParseOptions, json___ParseOptions__initialize(newlines_in_values))
}
3 changes: 2 additions & 1 deletion r/R/parquet.R
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,8 @@ ParquetWriterPropertiesBuilder <- R6Class("ParquetWriterPropertiesBuilder", inhe
)
},
set_compression_level = function(table, compression_level){
assert_that(is_integerish(compression_level))
# cast to integer but keep names
compression_level <- set_names(as.integer(compression_level), names(compression_level))
private$.set(table, compression_level,
parquet___ArrowWriterProperties___Builder__set_compression_levels
)
Expand Down
5 changes: 1 addition & 4 deletions r/R/schema.R
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,7 @@ Schema <- R6Class("Schema",
),
active = list(
names = function() {
out <- Schema__field_names(self)
# Hack: Rcpp should set the encoding
Encoding(out) <- "UTF-8"
out
Schema__field_names(self)
},
num_fields = function() Schema__num_fields(self),
fields = function() map(Schema__fields(self), shared_ptr, class = Field),
Expand Down
2 changes: 1 addition & 1 deletion r/R/struct.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ StructType <- R6Class("StructType",
GetFieldIndex = function(name) StructType__GetFieldIndex(self, name)
)
)
StructType$create <- function(...) shared_ptr(StructType, struct_(.fields(list(...))))
StructType$create <- function(...) shared_ptr(StructType, struct__(.fields(list(...))))

#' @rdname data-type
#' @export
Expand Down
6 changes: 5 additions & 1 deletion r/R/table.R
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,11 @@ Table$create <- function(..., schema = NULL) {
names(dots) <- rep_len("", length(dots))
}
stopifnot(length(dots) > 0)
shared_ptr(Table, Table__from_dots(dots, schema))
if (all_record_batches(dots)) {
shared_ptr(Table, Table__from_record_batches(dots, schema))
} else {
shared_ptr(Table, Table__from_dots(dots, schema))
}
}

#' @export
Expand Down
8 changes: 4 additions & 4 deletions r/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Note that after any change to the C++ library, you must reinstall it and
run `make clean` or `git clean -fdx .` to remove any cached object code
in the `r/src/` directory before reinstalling the R package. This is
only necessary if you make changes to the C++ library source; you do not
need to manually purge object files if you are only editing R or Rcpp
need to manually purge object files if you are only editing R or C++
code inside `r/`.

Once you’ve built the C++ library, you can install the R package and its
Expand All @@ -120,7 +120,7 @@ R -e 'install.packages(c("devtools", "roxygen2", "pkgdown", "covr")); devtools::
R CMD INSTALL .
```

If you need to set any compilation flags while building the Rcpp
If you need to set any compilation flags while building the C++
extensions, you can use the `ARROW_R_CXXFLAGS` environment variable. For
example, if you are using `perf` to profile the R extensions, you may
need to set
Expand Down Expand Up @@ -149,9 +149,9 @@ For any other build/configuration challenges, see the [C++ developer
guide](https://arrow.apache.org/docs/developers/cpp/building.html) and
`vignette("install", package = "arrow")`.

### Editing Rcpp code
### Editing C++ code

The `arrow` package uses some customized tools on top of `Rcpp` to
The `arrow` package uses some customized tools on top of `cpp11` to
prepare its C++ code in `src/`. If you change C++ code in the R package,
you will need to set the `ARROW_R_DEV` environment variable to `TRUE`
(optionally, add it to your`~/.Renviron` file to persist across
Expand Down
27 changes: 12 additions & 15 deletions r/data-raw/codegen.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@
# This file is used to generate code in the files
# src/arrowExports.cpp and R/arrowExports.R
#
# This is similar to what Rcpp::compileAttributes() would do,
# This is similar to what compileAttributes() would do,
# with some arrow specific changes.
#
# Functions are decorated with [[arrow::export]] instead of [[Rcpp::export]]
# Functions are decorated with [[arrow::export]]
# and the generated code adds a layer of protection so that
# the arrow package can be installed even when libarrow is not
#
Expand All @@ -45,9 +45,6 @@ suppressPackageStartupMessages({
get_exported_functions <- function(decorations, export_tag) {
out <- decorations %>%
filter(decoration %in% paste0(export_tag, "::export")) %>%
# the three lines below can be expressed with rap()
# more concisely
# rap( ~ decor:::parse_cpp_function(context))
mutate(functions = map(context, decor:::parse_cpp_function)) %>%
{ vec_cbind(., vec_rbind(!!!pull(., functions))) } %>%
select(-functions) %>%
Expand All @@ -67,7 +64,7 @@ wrap_call <- function(name, return_type, args) {
if(return_type == "void") {
glue::glue("\t{call};\n\treturn R_NilValue;", .trim = FALSE)
} else {
glue::glue("\treturn Rcpp::wrap({call});")
glue::glue("\treturn cpp11::as_sexp({call});")
}
}

Expand All @@ -81,13 +78,13 @@ cpp_functions_definitions <- arrow_exports %>%
// {basename(file)}
#if defined(ARROW_R_WITH_{toupper(decoration)})
{return_type} {name}({real_params});
RcppExport SEXP _arrow_{name}({sexp_params}){{
BEGIN_RCPP
extern "C" SEXP _arrow_{name}({sexp_params}){{
BEGIN_CPP11
{input_params}{return_line}{wrap_call(name, return_type, args)}
END_RCPP
END_CPP11
}}
#else
RcppExport SEXP _arrow_{name}({sexp_params}){{
extern "C" SEXP _arrow_{name}({sexp_params}){{
\tRf_error("Cannot call {name}(). Please use arrow::install_arrow() to install required runtime libraries. ");
}}
#endif
Expand All @@ -96,7 +93,7 @@ cpp_functions_definitions <- arrow_exports %>%
sep = "\n",
real_params = glue_collapse_data(args, "{type} {name}"),
sexp_params = glue_collapse_data(args, "SEXP {name}_sexp"),
input_params = glue_collapse_data(args, "\tRcpp::traits::input_parameter<{type}>::type {name}({name}_sexp);", sep = "\n"),
input_params = glue_collapse_data(args, "\tarrow::r::Input<{type}>::type {name}({name}_sexp);", sep = "\n"),
return_line = if(nrow(args)) "\n" else ""
)
}) %>%
Expand All @@ -111,10 +108,10 @@ cpp_functions_registration <- arrow_exports %>%

writeLines(con = "src/arrowExports.cpp", glue::glue('
// Generated by using data-raw/codegen.R -> do not edit by hand
#include "./arrow_exports.h"
#include <Rcpp.h>
#include <cpp11.hpp>
#include <cpp11/declarations.hpp>

using namespace Rcpp;
#include "./arrow_exports.h"

{cpp_functions_definitions}

Expand Down Expand Up @@ -145,7 +142,7 @@ static const R_CallMethodDef CallEntries[] = {{
\t\t{{NULL, NULL, 0}}
}};

RcppExport void R_init_arrow(DllInfo* dll){{
extern "C" void R_init_arrow(DllInfo* dll){{
R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
R_useDynamicSymbols(dll, FALSE);
}}
Expand Down
25 changes: 25 additions & 0 deletions r/inst/include/cpp11.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// cpp11 version: 0.2.1.9000
// vendored on: 2020-08-20
#pragma once

#include "cpp11/R.hpp"
#include "cpp11/altrep.hpp"
#include "cpp11/as.hpp"
#include "cpp11/attribute_proxy.hpp"
#include "cpp11/data_frame.hpp"
#include "cpp11/doubles.hpp"
#include "cpp11/environment.hpp"
#include "cpp11/external_pointer.hpp"
#include "cpp11/function.hpp"
#include "cpp11/integers.hpp"
#include "cpp11/list.hpp"
#include "cpp11/list_of.hpp"
#include "cpp11/logicals.hpp"
#include "cpp11/matrix.hpp"
#include "cpp11/named_arg.hpp"
#include "cpp11/protect.hpp"
#include "cpp11/r_string.hpp"
#include "cpp11/r_vector.hpp"
#include "cpp11/raws.hpp"
#include "cpp11/sexp.hpp"
#include "cpp11/strings.hpp"
49 changes: 49 additions & 0 deletions r/inst/include/cpp11/R.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// cpp11 version: 0.2.1.9000
// vendored on: 2020-08-20
#pragma once

#include <limits>

#include "R_ext/Arith.h"

#undef FALSE
#undef TRUE
#undef NA_LOGICAL

extern "C" {
typedef enum {
FALSE = 0,
TRUE = 1,
NA_LOGICAL = std::numeric_limits<int>::min()
} Rboolean;
}

#define R_EXT_BOOLEAN_H_

#define R_NO_REMAP
#define STRICT_R_HEADERS
#include "Rinternals.h"
#undef STRICT_R_HEADERS
#undef R_NO_REMAP

// clang-format off
#ifdef __clang__
# pragma clang diagnostic push
# pragma clang diagnostic ignored "-Wattributes"
#endif

#ifdef __GNUC__
# pragma GCC diagnostic push
# pragma GCC diagnostic ignored "-Wattributes"
#endif
// clang-format on

#include "cpp11/altrep.hpp"

namespace cpp11 {
namespace literals {

constexpr R_xlen_t operator"" _xl(unsigned long long int value) { return value; }

} // namespace literals
} // namespace cpp11
Loading