Skip to content

Extension to R Serialization

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

RConsortium/sakura

Repository files navigation

sakura

Lifecycle: experimental CRAN status R-CMD-check Codecov test coverage

  ________  
 /\ sa    \
/  \  ku   \
\  /    ra /
 \/_______/

Extension to R Serialization

Extends the functionality of R serialization by augmenting the built-in reference hook system. This enhanced implementation allows an integrated single-pass operation that combines R serialization with third-party serialization methods.

Facilitates the serialization of even complex R objects, which contain non-system reference objects, such as those accessed via external pointers, to enable their use in parallel and distributed computing.

This package was a request from a meeting of the R Consortium Marshalling and Serialization Working Group held at useR!2024 in Salzburg, Austria. It is designed to eventually provide a common framework for marshalling in R.

It extracts the functionality embedded within the mirai async framework for use in other contexts.

Installation

Install the current release from CRAN:

install.packages("sakura")

Or the development version using:

pak::pak("shikokuchuo/sakura")

Overview

Some R objects by their nature cannot be serialized, such as those accessed via an external pointer.

Using the arrow package as an example:

library(arrow, warn.conflicts = FALSE)
obj <- list(as_arrow_table(iris), as_arrow_table(mtcars))

unserialize(serialize(obj, NULL))
#> [[1]]
#> Table
#> Error: Invalid <Table>, external pointer to null

In such cases, sakura::serial_config() can be used to create custom serialization configurations, specifying functions that hook into R’s native serialization mechanism for reference objects (‘refhooks’).

cfg <- sakura::serial_config(
  "ArrowTabular",
  arrow::write_to_raw,
  function(x) arrow::read_ipc_stream(x, as_data_frame = FALSE)
)

This configuration can then be supplied as the ‘hook’ argument for sakura::serialize() and sakura::unserialize().

sakura::unserialize(sakura::serialize(obj, cfg), cfg)
#> [[1]]
#> Table
#> 150 rows x 5 columns
#> $Sepal.Length <double>
#> $Sepal.Width <double>
#> $Petal.Length <double>
#> $Petal.Width <double>
#> $Species <dictionary<values=string, indices=int8>>
#> 
#> See $metadata for additional Schema metadata
#> 
#> [[2]]
#> Table
#> 32 rows x 11 columns
#> $mpg <double>
#> $cyl <double>
#> $disp <double>
#> $hp <double>
#> $drat <double>
#> $wt <double>
#> $qsec <double>
#> $vs <double>
#> $am <double>
#> $gear <double>
#> $carb <double>
#> 
#> See $metadata for additional Schema metadata

This time, the arrow tables are handled seamlessly.

Using torch as another example:

library(torch)
x <- list(torch_rand(5L), runif(5L))

unserialize(serialize(x, NULL))
#> [[1]]
#> torch_tensor
#> Error in (function (self) : external pointer is not valid

Base R serialization above fails, but sakura serialization succeeds:

cfg <- sakura::serial_config("torch_tensor", torch::torch_serialize, torch::torch_load)

sakura::unserialize(sakura::serialize(x, cfg), cfg)
#> [[1]]
#> torch_tensor
#>  0.3755
#>  0.0540
#>  0.3365
#>  0.3944
#>  0.5949
#> [ CPUFloatType{5} ]
#> 
#> [[2]]
#> [1] 0.4271107 0.5690996 0.8724742 0.8202838 0.3796990

C Interface

A low-level interface is provided for use by other packages. The following C callables are registered:

sakura_serialize_init;
sakura_unserialize_init;

sakura_serialize;
sakura_unserialize;

Their function signatures may be inspected in src/sakura.h.

Acknowledgements

We would like to thank in particular:

  • R Core for providing the interface to the R serialization mechanism.
  • Luke Tierney and Mike Cheng for their meticulous efforts in documenting the serialization interface.
  • Daniel Falbel for discussion around an efficient solution to serialization and transmission of torch tensors.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.