-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathfilearray_bind.Rd
More file actions
109 lines (95 loc) · 3.89 KB
/
filearray_bind.Rd
File metadata and controls
109 lines (95 loc) · 3.89 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bind.R
\name{filearray_bind}
\alias{filearray_bind}
\title{Merge and bind homogeneous file arrays}
\usage{
filearray_bind(
...,
.list = list(),
filebase = tempfile(),
symlink = FALSE,
overwrite = FALSE,
cache_ok = FALSE
)
}
\arguments{
\item{..., .list}{file array instances}
\item{filebase}{where to create merged array}
\item{symlink}{whether to use \code{\link[base]{file.symlink}}; if true,
then partition files will be symbolic-linked to the original arrays,
otherwise the partition files will be copied over. If you want your data
to be portable, do not use symbolic-links. The default value is \code{FALSE}}
\item{overwrite}{whether to overwrite when \code{filebase} already exists;
default is false, which raises errors}
\item{cache_ok}{see 'Details', only used if \code{overwrite} is true.}
}
\value{
A bound array in \code{'FileArray'} class.
}
\description{
The file arrays to be merged must be homogeneous:
same data type, partition size, and partition length
}
\details{
The input arrays must share the same data type and partition size.
The dimension for each partition should also be the same. For example
an array \code{x1} has dimension \eqn{100x20x30} with partition size
\code{1}, then each partition dimension is \eqn{100x20x1}, and there are
\code{30} partitions. \code{x1} can bind with another array of the same
partition size. This means if \code{x2} has dimension
\eqn{100x20x40} and each partition size is \code{1}, then \code{x1} and
\code{x2} can be merged.
If \code{filebase} exists and \code{overwrite} is \code{FALSE}, an error will
always raise. If \code{overwrite=TRUE} and \code{cache_ok=FALSE}, then
the existing \code{filebase} will be erased and any data stored within will
be lost.
If both \code{overwrite} and \code{cache_ok} are \code{TRUE}, then
, before erasing \code{filebase}, the function validates the existing
array header and compare the header signatures. If the existing header
signature is the same as the array to be created, then the existing array
will be returned. This \code{cache_ok} could be extremely useful when
binding large arrays with \code{symlink=FALSE} as the cache might avoid
moving files around. However, \code{cache_ok} should be enabled with caution.
This is because only the header information will be compared, but the
partition data will not be compared. If the existing array was generated from
an old versions of the source arrays, but the data from the source arrays
has been altered, then the \code{cache_ok=TRUE} is rarely proper as the cache
is outdated.
The \code{symlink} option should be used with extra caution. Creating
symbolic links is definitely faster than copying partition files. However,
since the partition files are simply linked to the original partition files,
changing to the input arrays will also affect the merged arrays, and
vice versa; see 'Examples'. Also for arrays created from symbolic links, if
the original
arrays are deleted, while the merged arrays will not be invalidated,
the corresponding partitions will no longer be accessible. Attempts to
set deleted partitions will likely result in failure. Therefore
\code{symlink} should be set to true when creating merged arrays are
temporary for read-only purpose, and when speed and disk space is in
consideration. For extended reading, please check \code{\link[base]{files}}
for details.
}
\examples{
partition_size <- 1
type <- "double"
x1 <- filearray_create(
tempfile(), c(2,2), type = type,
partition_size = partition_size)
x1[] <- 1:4
x2 <- filearray_create(
tempfile(), c(2,1), type = type,
partition_size = partition_size)
x2[] <- 5:6
y1 <- filearray_bind(x1, x2, symlink = FALSE)
y2 <- filearray_bind(x1, x2)
# y1 copies partition files, and y2 simply creates links
# if symlink is supported
y1[] - y2[]
# change x1
x1[1,1] <- NA
# y1 is not affected
y1[]
# y2 changes
y2[]
}