-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add peak variable support to MsBackendMemory and MsBackendDataFrame #297
Changes from 17 commits
6556a26
f0d13c6
dc02d60
533a225
8ee8335
1482b9d
9b3fc3a
7b2d2e8
db17225
a6fabdf
b5122c9
2f21f2d
ef17af4
1daa730
010638c
a5b8a7f
3db4369
1645a4c
7d169e4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -224,8 +224,8 @@ | |
#' used to submit the full spectra data as a `DataFrame` to the | ||
#' backend. This would allow the backend to be also usable for the | ||
#' [setBackend()] function from `Spectra`. Note that eventually (for | ||
#' *read-only* backends) also the `supportsSetBackend` method would | ||
#' need to be implemented to return `TRUE`. | ||
#' *read-only* backends) also the `supportsSetBackend` method would need | ||
#' to be implemented to return `TRUE`. | ||
#' The `backendInitialize` method has also to ensure to correctly set | ||
#' spectra variable `dataStorage`. | ||
#' | ||
|
@@ -422,28 +422,30 @@ | |
#' the number of spectra in `object`. `NA` are reported for MS1 | ||
#' spectra of if no precursor information is available. | ||
#' | ||
#' - `peaksData` returns a `list` with the spectras' peak data, i.e. numeric | ||
#' `matrix` with peak values. The length of the list is equal to the number | ||
#' of spectra in `object`. Each element of the list is a `numeric` `matrix` | ||
#' - `peaksData` returns a `list` with the spectras' peak data, i.e. m/z and | ||
#' intensity values or other *peak variables*. The length of the list is | ||
#' equal to the number of spectra in `object`. Each element of the list has | ||
#' to be a two-dimensional array (`matrix` or `data.frame`) | ||
#' with columns depending on the provided `columns` parameter (by default | ||
#' `"mz"` and `"intensity"`, but depends on the backend's available | ||
#' `peaksVariables`). For an empty spectrum, a `matrix` with 0 rows and | ||
#' columns according to `columns` is returned. The optional parameter | ||
#' `columns`, if supported by the backend, allows to define which peak | ||
#' variables should be returned in the `numeric` peak `matrix`. As a default | ||
#' `c("mz", "intensity")` should be used. | ||
#' `peaksVariables`). For an empty spectrum, a `matrix` (`data.frame`) with | ||
#' 0 rows and columns according to `columns` is returned. The optional | ||
#' parameter `columns`, if supported by the backend, allows to define which | ||
#' peak variables should be returned in the `numeric` peak `matrix`. As a | ||
#' default `c("mz", "intensity")` should be used. | ||
#' | ||
#' - `peaksData<-` replaces the peak data (m/z and intensity values) of the | ||
#' backend. This method expects a `list` of `matrix` objects with columns | ||
#' `"mz"` and `"intensity"` that has the same length as the number of | ||
#' spectra in the backend. Note that just writeable backends support this | ||
#' method. | ||
#' backend. This method expects a `list` of two dimensional arrays (`matrix` | ||
#' or `data.frame`) with columns representing the peak variables. All | ||
#' existing peaks data is expected to be replaced with these new values. The | ||
#' length of the `list` has to match the number of spectra of `object`. | ||
#' Note that only writeable backends need to support this method. | ||
#' | ||
#' - `peaksVariables`: lists the available variables for mass peaks. Default | ||
#' peak variables are `"mz"` and `"intensity"` (which all backends need to | ||
#' support and provide), but some backends might provide additional variables. | ||
#' These variables correspond to the column names of the `numeric` `matrix` | ||
#' representing the peak data (returned by `peaksData`). | ||
#' All these variables are expected to be returned (if requested) by the | ||
#' `peaksData` function. | ||
#' | ||
#' - `reset` a backend (if supported). This method will be called on the backend | ||
#' by the `reset,Spectra` method that is supposed to restore the data to its | ||
|
@@ -544,10 +546,7 @@ | |
#' way the data is organized internally, provides much faster access to the | ||
#' full peak data (i.e. the numerical matrices of m/z and intensity values). | ||
#' Also subsetting and access to any spectra variable (except `"mz"` and | ||
#' `"intensity"` is fastest for the `MsBackendMemory`. Finally, the | ||
#' `MsBackendMemory` supports also arbitrary peak annotations while the | ||
#' `MsBackendDataFrame` does not have support for such additional peak | ||
#' variables. | ||
#' `"intensity"` is fastest for the `MsBackendMemory`. | ||
#' | ||
#' Thus, for most use cases, the `MsBackendMemory` provides a higher | ||
#' performance and flexibility than the `MsBackendDataFrame` and should thus be | ||
|
@@ -556,20 +555,13 @@ | |
#' performance comparison. | ||
#' | ||
#' New objects can be created with the `MsBackendMemory()` and | ||
#' `MsBackendDataFrame()` function, respectively. The backend can be | ||
#' `MsBackendDataFrame()` function, respectively. Both backends can be | ||
#' subsequently initialized with the `backendInitialize` method, taking a | ||
#' `DataFrame` (or `data.frame`) with the MS data as first parameter `data`. | ||
#' `backendInitialize` for `MsBackendMemory` has a second parameter | ||
#' `peaksVariables` (default `peaksVariables = c("mz", "intensity")` that | ||
#' allows to specify which of the columns in the provided data frame should | ||
#' be considered as a *peaks variable* (i.e. information of an individual | ||
#' mass peak) rather than a *spectra variable* (i.e. information of an | ||
#' individual spectrum). Note that it is important to also include `"mz"` and | ||
#' `"intensity"` in `peaksVariables` as these would otherwise be considered | ||
#' to be spectra variables! Also, while it is possible to change the values of | ||
#' existing peaks variables using the `$<-` method, this method does **not** | ||
#' allow to add new peaks variables to an existing `MsBackendMemory`. New | ||
#' peaks variables should be added using the `backendInitialize` method. | ||
#' `DataFrame` (or `data.frame`) with the (full) MS data as first parameter | ||
#' `data`. The second parameter `peaksVariables` allows to define which columns | ||
#' in `data` contain *peak variables* such as the m/z and intensity values of | ||
#' individual peaks per spectrum. The default for this parameter is | ||
#' `peaksVariables = c("mz", "intensity")`. | ||
#' | ||
#' Suggested columns of this `DataFrame` are: | ||
#' | ||
|
@@ -598,13 +590,12 @@ | |
#' | ||
#' Additional columns are allowed too. | ||
#' | ||
#' For the `MsBackendMemory`, any column in the provided `data.frame` which | ||
#' contains a `list` of vectors each with length equal to the number of peaks | ||
#' for a spectrum will be used as additional *peak variable* (see examples | ||
#' below for details). | ||
#' The `peaksData` function for `MsBackendMemory` and `MsBackendDataFrame` | ||
#' returns a `list` of `numeric` `matrix` by default (with parameter | ||
#' `columns = c("mz", "intensity")`). If other peak variables are requested, | ||
#' a `list` of `data.frame` is returned (to ensure m/z and intensity values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure to understand the "to ensure m/z ...". Should it not read "ensuring that m/z ..."? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, thanks |
||
#' are always `numeric`). | ||
#' | ||
#' The `MsBackendDataFrame` ignores parameter `columns` of the `peaksData` | ||
#' function and returns **always** m/z and intensity values. | ||
#' | ||
#' @section `MsBackendMzR`, on-disk MS data backend: | ||
#' | ||
|
@@ -650,6 +641,7 @@ | |
#' The `MsBackendMzR` ignores parameter `columns` of the `peaksData` | ||
#' function and returns **always** m/z and intensity values. | ||
#' | ||
#' | ||
#' @section `MsBackendHdf5Peaks`, on-disk MS data backend: | ||
#' | ||
#' The `MsBackendHdf5Peaks` keeps, similar to the `MsBackendMzR`, peak data | ||
|
@@ -681,6 +673,7 @@ | |
#' The `MsBackendHdf5Peaks` ignores parameter `columns` of the `peaksData` | ||
#' function and returns **always** m/z and intensity values. | ||
#' | ||
#' | ||
#' @section Implementation notes: | ||
#' | ||
#' Backends extending `MsBackend` **must** implement all of its methods (listed | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,10 +14,12 @@ NULL | |
|
||
setClass("MsBackendDataFrame", | ||
contains = "MsBackend", | ||
slots = c(spectraData = "DataFrame"), | ||
slots = c(spectraData = "DataFrame", | ||
peaksVariables = "character"), | ||
prototype = prototype(spectraData = DataFrame(), | ||
peaksVariables = c("mz", "intensity"), | ||
readonly = FALSE, | ||
version = "0.1")) | ||
version = "0.2")) | ||
|
||
setValidity("MsBackendDataFrame", function(object) { | ||
msg <- .valid_spectra_data_required_columns(object@spectraData) | ||
|
@@ -27,7 +29,8 @@ setValidity("MsBackendDataFrame", function(object) { | |
.valid_column_datatype(object@spectraData, .SPECTRA_DATA_COLUMNS), | ||
.valid_intensity_column(object@spectraData), | ||
.valid_mz_column(object@spectraData), | ||
.valid_intensity_mz_columns(object@spectraData)) | ||
.valid_peaks_variable_columns(object@spectraData, | ||
.peaks_variables(object))) | ||
if (is.null(msg)) TRUE | ||
else msg | ||
}) | ||
|
@@ -52,12 +55,16 @@ setMethod("show", "MsBackendDataFrame", function(object) { | |
#' | ||
#' @rdname MsBackend | ||
setMethod("backendInitialize", signature = "MsBackendDataFrame", | ||
function(object, data, ...) { | ||
function(object, data, peaksVariables = c("mz", "intensity"), ...) { | ||
if (missing(data)) data <- DataFrame() | ||
if (is.data.frame(data)) | ||
data <- DataFrame(data) | ||
if (!is(data, "DataFrame")) | ||
stop("'data' has to be a 'DataFrame'") | ||
peaksVariables <- intersect(peaksVariables, colnames(data)) | ||
if (sum(c("mz", "intensity") %in% peaksVariables) == 1L) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand this check, it will throw an error if only one of "mz" or "intensity" are in the peaksVariables. If none are, this wouldn't trigger the error. Why not
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch! thanks, changed that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually, after checking again: we were supporting the following situations:
An error is only thrown if either m/z but not intensity, or intensity but not m/z is provided. from a data point of view that does not make sense, you can't have one of the two. you could either have both or none. Open to discuss @lgatto when you think that's not a good idea. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, fine by me. May be just clarify with a comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. jepp, will add that and also check that I mention it in documentation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added as a comment and docu. |
||
stop("Both \"mz\" and \"intensity\" peak variables ", | ||
"need to be provided.") | ||
if (nrow(data)) { | ||
data$dataStorage <- "<memory>" | ||
if (nrow(data) && !is(data$mz, "NumericList")) | ||
|
@@ -67,7 +74,8 @@ setMethod("backendInitialize", signature = "MsBackendDataFrame", | |
compress = FALSE) | ||
} | ||
object@spectraData <- data | ||
validObject(object) | ||
object@peaksVariables <- peaksVariables | ||
validObject(object) # this checks also for peaks variables. | ||
object | ||
}) | ||
|
||
|
@@ -91,14 +99,21 @@ setMethod("acquisitionNum", "MsBackendDataFrame", function(object) { | |
|
||
#' @rdname hidden_aliases | ||
setMethod("peaksData", "MsBackendDataFrame", | ||
function(object, columns = peaksVariables(object)) { | ||
if (!all(columns %in% c("mz", "intensity"))) | ||
stop("'peaksData' for 'MsBackendDataFrame' does only support", | ||
" columns \"mz\" and \"intensity\"", call. = FALSE) | ||
lst <- lapply(columns, function(z) do.call(z, list(object))) | ||
function(object, columns = c("mz", "intensity")) { | ||
na <- columns[!columns %in% peaksVariables(object)] | ||
if (length(na)) | ||
stop("Peaks variable \"", na, "\" not available.") | ||
lst <- lapply(columns, function(z) { | ||
if (z %in% c("mz", "intensity")) | ||
do.call(z, list(object)) | ||
else object@spectraData[, z] | ||
}) | ||
names(lst) <- columns | ||
tmp <- do.call(mapply, c(list(FUN = cbind, SIMPLIFY = FALSE, | ||
USE.NAMES = FALSE), lst)) | ||
if (all(columns %in% c("mz", "intensity"))) | ||
fun <- cbind | ||
else fun <- cbind.data.frame | ||
do.call(mapply, c(list(FUN = fun, SIMPLIFY = FALSE, | ||
USE.NAMES = FALSE), lst)) | ||
}) | ||
|
||
#' @rdname hidden_aliases | ||
|
@@ -337,22 +352,30 @@ setMethod("precursorMz", "MsBackendDataFrame", function(object) { | |
|
||
#' @rdname hidden_aliases | ||
setReplaceMethod("peaksData", "MsBackendDataFrame", function(object, value) { | ||
if (!(is.list(value) || inherits(value, "SimpleList"))) | ||
stop("'value' has to be a list-like object") | ||
if (length(value) != length(object)) | ||
stop("Length of 'value' has to match length of 'object'") | ||
vals <- lapply(value, "[", , 1L) | ||
if (!is(vals, "NumericList")) | ||
vals <- NumericList(vals, compress = FALSE) | ||
object@spectraData$mz <- vals | ||
vals <- lapply(value, "[", , 2L) | ||
if (!is(vals, "NumericList")) | ||
vals <- NumericList(vals, compress = FALSE) | ||
object@spectraData$intensity <- vals | ||
validObject(object) | ||
if (length(object)) { | ||
.check_peaks_data_value(value, length(object)) | ||
cns <- colnames(value[[1L]]) | ||
for (cn in cns) { | ||
vals <- lapply(value, "[", , cn) | ||
if (cn %in% c("mz", "intensity")) | ||
vals <- NumericList(vals, compress = FALSE) | ||
object@spectraData[[cn]] <- vals | ||
} | ||
## remove eventual old peak variables | ||
rem <- setdiff(peaksVariables(object), cns) | ||
for (r in rem) | ||
object@spectraData[[r]] <- NULL | ||
object@peaksVariables <- cns | ||
validObject(object) | ||
} | ||
object | ||
}) | ||
|
||
#' @rdname hidden_aliases | ||
setMethod("peaksVariables", "MsBackendDataFrame", function(object) { | ||
union(c("mz", "intensity"), .peaks_variables(object)) | ||
}) | ||
|
||
#' @rdname hidden_aliases | ||
setMethod("rtime", "MsBackendDataFrame", function(object) { | ||
.get_column(object@spectraData, "rtime") | ||
|
@@ -388,6 +411,8 @@ setMethod("selectSpectraVariables", "MsBackendDataFrame", | |
msg <- .valid_spectra_data_required_columns(object@spectraData) | ||
if (length(msg)) | ||
stop(msg) | ||
object@peaksVariables <- intersect(object@peaksVariables, | ||
colnames(object@spectraData)) | ||
validObject(object) | ||
object | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a missing ) after "intensity".