________
/\ sa \
/ \ ku \
\ / ra /
\/_______/
Extends the functionality of R serialization by augmenting the built-in reference hook system. This enhanced implementation allows an integrated single-pass operation that combines R serialization with third-party serialization methods.
Facilitates the serialization of even complex R objects, which contain non-system reference objects, such as those accessed via external pointers, to enable their use in parallel and distributed computing.
This package was a request from a meeting of the R Consortium Marshalling and Serialization Working Group held at useR!2024 in Salzburg, Austria. It is designed to eventually provide a common framework for marshalling in R.
It extracts the functionality embedded within the mirai async framework for use in other contexts.
Install the current release from CRAN:
install.packages("sakura")
Or the development version using:
pak::pak("shikokuchuo/sakura")
Some R objects by their nature cannot be serialized, such as those accessed via an external pointer.
Using the arrow
package as an
example:
library(arrow, warn.conflicts = FALSE)
obj <- list(as_arrow_table(iris), as_arrow_table(mtcars))
unserialize(serialize(obj, NULL))
#> [[1]]
#> Table
#> Error: Invalid <Table>, external pointer to null
In such cases, sakura::serial_config()
can be used to create custom
serialization configurations, specifying functions that hook into R’s
native serialization mechanism for reference objects (‘refhooks’).
cfg <- sakura::serial_config(
"ArrowTabular",
arrow::write_to_raw,
function(x) arrow::read_ipc_stream(x, as_data_frame = FALSE)
)
This configuration can then be supplied as the ‘hook’ argument for
sakura::serialize()
and sakura::unserialize()
.
sakura::unserialize(sakura::serialize(obj, cfg), cfg)
#> [[1]]
#> Table
#> 150 rows x 5 columns
#> $Sepal.Length <double>
#> $Sepal.Width <double>
#> $Petal.Length <double>
#> $Petal.Width <double>
#> $Species <dictionary<values=string, indices=int8>>
#>
#> See $metadata for additional Schema metadata
#>
#> [[2]]
#> Table
#> 32 rows x 11 columns
#> $mpg <double>
#> $cyl <double>
#> $disp <double>
#> $hp <double>
#> $drat <double>
#> $wt <double>
#> $qsec <double>
#> $vs <double>
#> $am <double>
#> $gear <double>
#> $carb <double>
#>
#> See $metadata for additional Schema metadata
This time, the arrow tables are handled seamlessly.
Using torch
as another example:
library(torch)
x <- list(torch_rand(5L), runif(5L))
unserialize(serialize(x, NULL))
#> [[1]]
#> torch_tensor
#> Error in (function (self) : external pointer is not valid
Base R serialization above fails, but sakura
serialization succeeds:
cfg <- sakura::serial_config("torch_tensor", torch::torch_serialize, torch::torch_load)
sakura::unserialize(sakura::serialize(x, cfg), cfg)
#> [[1]]
#> torch_tensor
#> 0.3755
#> 0.0540
#> 0.3365
#> 0.3944
#> 0.5949
#> [ CPUFloatType{5} ]
#>
#> [[2]]
#> [1] 0.4271107 0.5690996 0.8724742 0.8202838 0.3796990
A low-level interface is provided for use by other packages. The following C callables are registered:
sakura_serialize_init;
sakura_unserialize_init;
sakura_serialize;
sakura_unserialize;
Their function signatures may be inspected in src/sakura.h
.
We would like to thank in particular:
- R Core for providing the interface to the R serialization mechanism.
- Luke Tierney and Mike Cheng for their meticulous efforts in documenting the serialization interface.
- Daniel Falbel for discussion around an efficient solution to serialization and transmission of torch tensors.
–
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.